使用xml.etree.cElementTree()实现XML文件的搜索与遍历

发布时间：2023-12-16 08:05:53

在Python中，我们可以使用xml.etree.cElementTree模块来解析和处理XML文件。该模块提供了一种以元素树的形式访问和操作XML数据的方式。下面是一个使用xml.etree.cElementTree模块遍历和搜索XML文件的示例。

假设我们有一个名为data.xml的XML文件，内容如下：

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
    <book category="children">
        <author>Tove Jansson</author>
        <title>The Moomins and the Great Flood</title>
        <year>1945</year>
        <price>9.99</price>
    </book>
    <book category="fiction">
        <author>Harper Lee</author>
        <title>To Kill a Mockingbird</title>
        <year>1960</year>
        <price>12.99</price>
    </book>
    <book category="fiction">
        <author>J.R.R. Tolkien</author>
        <title>The Lord of the Rings</title>
        <year>1955</year>
        <price>19.99</price>
    </book>
</bookstore>

我们可以使用xml.etree.cElementTree模块来遍历并搜索这个XML文件。下面是一个示例程序：

import xml.etree.cElementTree as ET

# 解析XML文件
tree = ET.parse('data.xml')
root = tree.getroot()

# 遍历XML文件
def traverse_element(element, indent=0):
    # 输出标签名称和属性
    print(f'{"  " * indent}Tag: {element.tag}, Attributes: {element.attrib}')
    
    # 输出文本内容
    if element.text:
        print(f'{"  " * (indent + 1)}Text: {element.text}')

    # 遍历子元素
    for child in element:
        traverse_element(child, indent + 1)

# 遍历根元素
traverse_element(root)

运行上述程序，将会输出以下结果：

Tag: bookstore, Attributes: {}
  Tag: book, Attributes: {'category': 'children'}
    Tag: author, Attributes: {}
      Text: Tove Jansson
    Tag: title, Attributes: {}
      Text: The Moomins and the Great Flood
    Tag: year, Attributes: {}
      Text: 1945
    Tag: price, Attributes: {}
      Text: 9.99
  Tag: book, Attributes: {'category': 'fiction'}
    Tag: author, Attributes: {}
      Text: Harper Lee
    Tag: title, Attributes: {}
      Text: To Kill a Mockingbird
    Tag: year, Attributes: {}
      Text: 1960
    Tag: price, Attributes: {}
      Text: 12.99
  Tag: book, Attributes: {'category': 'fiction'}
    Tag: author, Attributes: {}
      Text: J.R.R. Tolkien
    Tag: title, Attributes: {}
      Text: The Lord of the Rings
    Tag: year, Attributes: {}
      Text: 1955
    Tag: price, Attributes: {}
      Text: 19.99

通过这个程序，我们可以看到XML文件中每个元素的标签名称、属性和文本内容。程序会递归遍历每个元素的子元素，将其结果缩进以表示层级关系。

除了遍历XML文件，我们还可以使用xml.etree.cElementTree模块来搜索特定元素。例如，我们可以搜索所有book元素，找出它们的作者和价格信息。下面是一个示例程序：

import xml.etree.cElementTree as ET

# 解析XML文件
tree = ET.parse('data.xml')
root = tree.getroot()

# 搜索并打印指定元素
for book in root.iter('book'):
    author = book.find('author').text
    price = book.find('price').text
    print(f'Author: {author}, Price: {price}')

运行上述程序，将会输出以下结果：

Author: Tove Jansson, Price: 9.99
Author: Harper Lee, Price: 12.99
Author: J.R.R. Tolkien, Price: 19.99

通过这个程序，我们只获取book元素的author和price子元素的文本内容，并打印出来。

总结来说，通过xml.etree.cElementTree模块，我们可以方便地解析、遍历和搜索XML文件。我们可以使用ET.parse()方法来解析XML文件，然后通过getroot()方法获取根元素。通过递归遍历元素树，我们可以访问和处理XML数据。此外，我们还可以使用find()、findall()等方法来搜索特定的元素。