使用python编写的server_document()函数解析和提取服务器文档中的表格数据

发布时间：2023-12-25 21:38:45

在Python中，可以使用BeautifulSoup库来解析服务器文档中的表格数据。以下是一个使用python编写的server_document()函数用于解析和提取服务器文档中的表格数据，并提供了一个例子进行演示。

from bs4 import BeautifulSoup

def server_document(html_doc):
    # 创建BeautifulSoup对象，将服务器文档作为输入参数
    soup = BeautifulSoup(html_doc, 'html.parser')
    
    # 找到页面中的所有表格
    tables = soup.find_all('table')
    
    table_data = []
    
    # 遍历每个表格
    for table in tables:
        rows = table.find_all('tr')  # 找到表格中的所有行
        
        table_rows = []
        
        # 遍历每一行
        for row in rows:
            cells = row.find_all('td')  # 找到行中的所有单元格
            
            row_data = []
            
            # 遍历每个单元格
            for cell in cells:
                row_data.append(cell.text.strip())  # 提取单元格中的文本，并去除前后空格
                
            table_rows.append(row_data)  # 将当前行的数据添加到表格中
            
        table_data.append(table_rows)  # 将当前表格的数据添加到结果列表中
    
    return table_data

# 服务器文档示例 HTML
html_doc = """
<html>
<body>
    <table>
        <tr>
            <td>Apple</td>
            <td>Red</td>
            <td>Round</td>
        </tr>
        <tr>
            <td>Banana</td>
            <td>Yellow</td>
            <td>Long</td>
        </tr>
    </table>
    <table>
        <tr>
            <td>Grapes</td>
            <td>Purple</td>
            <td>Small</td>
        </tr>
        <tr>
            <td>Orange</td>
            <td>Orange</td>
            <td>Round</td>
        </tr>
    </table>
</body>
</html>
"""

# 调用server_document()函数解析服务器文档中的表格数据
table_data = server_document(html_doc)

# 打印提取到的表格数据
for table in table_data:
    print("Table:")
    for row in table:
        print(row)
    print("
")

以上示例代码会输出以下结果：

Table:
['Apple', 'Red', 'Round']
['Banana', 'Yellow', 'Long']

Table:
['Grapes', 'Purple', 'Small']
['Orange', 'Orange', 'Round']

以上代码将提取到的表格数据存储在一个多维列表中，每个表格的数据以嵌套的列表形式存储，每一行的数据以列表的形式存储。你可以根据需要进一步处理提取到的数据，例如将其写入CSV文件或将其插入到数据库中。