一、ElementTree 库简介
xml.etree.ElementTree 是 Python 处理 XML 的标准库,它:
- ✅ 轻量高效:比 DOM 解析器更简单,比 SAX 解析器更易用
- ✅ 内置模块:无需安装任何第三方包
- ✅ Pythonic API:使用方式符合 Python 习惯
二、核心操作快速上手
1. 解析 XML:读取与理解
import xml.etree.ElementTree as ET
xml_string = '''
<bookstore>
<book category="编程">
<title>Python编程从入门到实践</title>
<author>Eric Matthes</author>
<year>2016</year>
<price>89.00</price>
</book>
</bookstore>
'''
root = ET.fromstring(xml_string)
tree = ET.parse('books.xml')
root = tree.getroot()
print(f"根元素标签: {root.tag}")
2. 遍历与访问元素
for child in root:
print(f"子元素: {child.tag}, 属性: {child.attrib}")
for subchild in child:
print(f" {subchild.tag}: {subchild.text}")
title = root.find('book/title')
if title is not None:
print(f"书名: {title.text}")
all_books = root.findall('book')
print(f"找到 {len(all_books)} 本书")
1. 基本查找
element = root.find('child')
element = root.find('.//child')
element = root.find('parent/child')
2. 带命名空间的查找
namespace = '{http://www.example.com/ns}'
element = root.find(f'.//{namespace}status')
ns = {'ns': 'http://www.example.com/ns'}
element = root.find('.//ns:status', ns)
3. 使用XPath表达式(findall支持更多)
element = root.find('.//item[@id]')
element = root.find('.//item[@id="123"]')
element = root.find('.//name[text()="John"]')
element = root.find('.//item[1]')
3. 创建 XML 文档
root = ET.Element("bookstore")
book = ET.SubElement(root, "book")
book.set("category", "编程")
title = ET.SubElement(book, "title")
title.text = "Python核心编程"
author = ET.SubElement(book, "author")
author.text = "Wesley Chun"
tree = ET.ElementTree(root)
tree.write("new_book.xml", encoding="utf-8", xml_declaration=True)
print("XML文件创建成功!")
4. 修改 XML 内容
for price in root.iter('price'):
old_price = float(price.text)
price.text = str(old_price * 0.9)
price.set('discount', '10%')
for book in root.findall('book'):
year = book.find('year')
if year is not None and int(year.text) < 2010:
root.remove(book)
tree.write('updated_books.xml')
5. 查找元素的多种方式
expensive_books = root.findall("book[price>50]")
for author in root.iter('author'):
print(f"作者: {author.text}")
for book in root.findall('book'):
price = float(book.find('price').text)
category = book.get('category')
if price > 80 and category == '编程':
print(f"昂贵的编程书: {book.find('title').text}")
三、实用技巧与注意事项
处理命名空间
ns_xml = '''
<root xmlns:bk="http://example.com/books">
<bk:book>
<bk:title>Python学习手册</bk:title>
</bk:book>
</root>
'''
namespaces = {'bk': 'http://example.com/books'}
root = ET.fromstring(ns_xml)
title = root.find('bk:book/bk:title', namespaces)
处理大型 XML 文件
for event, elem in ET.iterparse('large_file.xml', events=('start', 'end')):
if event == 'start' and elem.tag == 'book':
print(f"开始处理书: {elem.find('title').text}")
elif event == 'end' and elem.tag == 'book':
elem.clear()
context = ET.iterparse('large_file.xml', events=('end',))
for event, elem in context:
if elem.tag == 'book' and event == 'end':
process_book(elem)
elem.clear()
四、常见错误与解决方法
| 错误情况 | 原因 | 解决方法 |
|---|
SyntaxError | XML格式不正确 | 检查XML是否完整,标签是否闭合 |
AttributeError | 元素不存在就访问属性 | 先用if elem is not None:判断 |
| 中文乱码 | 编码问题 | 确保读写时指定encoding='utf-8' |
| 找不到元素 | 命名空间问题 | 注册并使用命名空间前缀 |
五、快速参考表
| 任务 | 方法 | 示例 |
|---|
| 解析XML | ET.parse() | tree = ET.parse('file.xml') |
| 获取根 | .getroot() | root = tree.getroot() |
| 查找单个 | .find() | elem = root.find('path') |
| 查找所有 | .findall() | elems = root.findall('path') |
| 遍历所有 | .iter() | for el in root.iter('tag'): |
| 获取文本 | .text | text = elem.text |
| 获取属性 | .attrib | attr = elem.attrib |
| 设置属性 | .set() | elem.set('key', 'value') |
| 创建元素 | ET.SubElement() | child = ET.SubElement(parent, 'tag') |