快捷键:Ctrl+Shift+X
语法
<body>
<div>
<a href="/ershoufang/dongcheng/" title="北京东城在售二手房 ">东城</a>
<a href="/ershoufang/xicheng/" title="北京西城在售二手房 ">西城</a>
<a href="/ershoufang/chaoyang/" title="北京朝阳在售二手房 ">朝阳</a>
</div>
</body>
/:父子关系,如:div/a[2],需要第几个标签,就在标签后面加上[顺序值]
//:子孙不相邻关系
@:通过属性定位标签,如://a[@title="北京西城在售二手房 "]
@属性名:提取标签内指定属性名的属性值,如://a[@title="北京西城在售二手房 "]/@title
代码
from lxml import etree
import requests
url = 'http://xczx.news.cn/mlxc.htm'
response = requests.get(url=url)
report = response.text
html_tree = etree.HTML(report)
tags = html_tree.xpath('//div[@id="content-list"]//span/a/text()')
text = ''
for tag in tags:
text += tag
print(text)