Python解析复杂的嵌套XML数据

发布时间：2023-12-11 17:37:24

在Python中，可以使用xml.etree.ElementTree模块来解析复杂的嵌套XML数据。下面是一个简单的示例，演示如何解析一个嵌套XML结构。

假设我们有以下的XML数据：

<data>
    <country name="China">
        <province name="Beijing">
            <city>
                <name>Beijing</name>
                <population>2154</population>
            </city>
            <city>
                <name>Tianjin</name>
                <population>1568</population>
            </city>
        </province>
        <province name="Shanghai">
            <city>
                <name>Shanghai</name>
                <population>2423</population>
            </city>
        </province>
    </country>
    <country name="United States">
        <province name="California">
            <city>
                <name>Los Angeles</name>
                <population>3999</population>
            </city>
            <city>
                <name>San Francisco</name>
                <population>883</population>
            </city>
        </province>
        <province name="New York">
            <city>
                <name>New York City</name>
                <population>8392</population>
            </city>
        </province>
    </country>
</data>

首先，我们需要导入xml.etree.ElementTree模块：

import xml.etree.ElementTree as ET

然后，我们可以使用ET.parse()函数将XML数据解析为一个ElementTree对象：

tree = ET.parse('data.xml')

接下来，我们可以使用tree.getroot()方法获取XML树的根元素：

root = tree.getroot()

我们可以使用root变量来访问根元素中的子元素和属性。可以使用root.tag属性获取根元素的标签名，例如data。我们还可以使用root.attrib属性获取根元素的属性，例如{}。

print(root.tag)  # 输出: data
print(root.attrib)  # 输出: {}

要遍历XML树的子元素，可以使用root.iter()方法。这个方法会返回一个迭代器，可以在迭代器上使用for循环遍历树的所有子元素。然后，我们可以使用元素的tag、attrib和text属性来访问其标签名、属性和文本内容。

for country in root.iter('country'):
    country_name = country.attrib['name']
    print(country_name)
    for province in country.iter('province'):
        province_name = province.attrib['name']
        print('\t' + province_name)
        for city in province.iter('city'):
            city_name = city.find('name').text
            population = city.find('population').text
            print('\t\t' + city_name + ': ' + population)

上述代码会输出以下内容：

China
    Beijing
        Beijing: 2154
        Tianjin: 1568
    Shanghai
        Shanghai: 2423
United States
    California
        Los Angeles: 3999
        San Francisco: 883
    New York
        New York City: 8392

在上面的示例中，我们使用了root.iter()方法来获取XML树中所有country元素。然后，我们在每个country元素上再次使用iter()方法来获取其下的province元素。类似地，我们再次使用iter()方法来获取每个province元素下的city元素。最后，我们使用find()方法来找到city元素下的name和population子元素，并使用text属性获取它们的文本内容。

这只是一个简单的示例，演示了如何解析一个嵌套的XML结构。在实际的项目中，可能会遇到更复杂的XML数据，需要深入了解xml.etree.ElementTree模块的更多功能和用法。