我们有一个API查询返回了一个XML树,需要从中提取一些特定的值,例如LinkedInCount。
-
解决方案
可以使用Python的XML解析库
lxml来解析XML树,并提取所需的值。具体步骤如下:- 安装
lxml库:
pip install lxml- 导入
lxml库和xml.etree.ElementTree库:
import lxml.etree as ET from xml.etree import ElementTree- 解析XML字符串并生成XML树:
xml_string = """ <aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"> <aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11"> <aws:OperationRequest> <aws:RequestId>5486794a-0d03-4d47-a45b-e95764c3f0ee</aws:RequestId>< /aws:OperationRequest> <aws:UrlInfoResult> <aws:Alexa> <aws:ContentData> <aws:DataUrl type="canonical">yahoo.com/</aws:DataUrl> <aws:Asin>B00006D2TC</aws:Asin> <aws:SiteData> <aws:Title>Yahoo!</aws:Title> <aws:Description>Personalized content and search options. Chatrooms, free e-mail, clubs, and pager.</aws:Description> <aws:OnlineSince>18-Jan-1995</aws:OnlineSince> </aws:SiteData> <aws:Speed> <aws:MedianLoadTime>2242</aws:MedianLoadTime> <aws:Percentile>51</aws:Percentile> </aws:Speed> <aws:AdultContent>no</aws:AdultContent> <aws:Language> <aws:Locale>en</aws:Locale> </aws:Language> <aws:LinksInCount>76894</aws:LinksInCount> <aws:OwnedDomains> <aws:OwnedDomain> <aws:Domain>yahooligans.com</aws:Domain> <aws:Title>yahooligans.com</aws:Title> </aws:OwnedDomain> </aws:OwnedDomains> </aws:ContentData> <aws:Related> <aws:DataUrl type="canonical">yahoo.com/</aws:DataUrl> <aws:Asin>B00006D2TC</aws:Asin> <aws:RelatedLinks> <aws:RelatedLink> <aws:DataUrl type="canonical">aol.com/</aws:DataUrl> <aws:NavigableUrl>http://aol.com/</aws:NavigableUrl> <aws:Asin>B00006ARD3</aws:Asin> <aws:Relevance>301</aws:Relevance> </aws:RelatedLink> </aws:RelatedLinks> <aws:Categories> <aws:CategoryData> <aws:Title>On the Web/Web Portals</aws:Title> <aws:AbsolutePath>Top/Computers/Internet/On_the_Web/Web_Portals</aws:AbsolutePath> </aws:CategoryData> </aws:Categories> </aws:Related> <aws:TrafficData> <aws:DataUrl type="canonical">yahoo.com/</aws:DataUrl> <aws:Asin>B00006D2TC</aws:Asin> <aws:Rank>1</aws:Rank> <aws:UsageStatistics> <aws:UsageStatistic> <aws:TimeRange> <aws:Days>1</aws:Days> </aws:TimeRange> <aws:Rank> <aws:Value>1</aws:Value> <aws:Delta>+0</aws:Delta> </aws:Rank> <aws:Reach> <aws:Rank> <aws:Value>2</aws:Value> <aws:Delta>+0</aws:Delta> </aws:Rank> <aws:PerMillion> <aws:Value>252,500</aws:Value> <aws:Delta>-1%</aws:Delta> </aws:PerMillion> </aws:Reach> <aws:PageViews> <aws:PerMillion> <aws:Value>51,400</aws:Value> <aws:Delta>-1%</aws:Delta> </aws:PerMillion> <aws:Rank> <aws:Value>1</aws:Value> <aws:Delta>+0</aws:Delta> </aws:Rank> <aws:PerUser> <aws:Value>13.7</aws:Value> <aws:Delta>-1%</aws:Delta> </aws:PerUser> </aws:PageViews> </aws:UsageStatistic> </aws:UsageStatistics> </aws:TrafficData> </aws:Alexa> </aws:UrlInfoResult> <aws:ResponseStatus xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"> <aws:StatusCode>Success</aws:StatusCode> </aws:ResponseStatus> </aws:Response> </aws:UrlInfoResponse> """ tree = ET.fromstring(xml_string)- 使用XPath表达式查找所需的元素:
links_in_count = tree.find("//{http://alexa.amazonaws.com/doc/2005-10-05/}LinksInCount")- 获取元素的值:
print(links_in_count.text)输出结果:
76894 - 安装