使用Python的下载器(Downloader)下载并保存网页内容的方法

发布时间：2023-12-17 13:06:34

在Python中，可以使用不同的库或模块来实现网页内容的下载和保存。下面介绍两种常用的方法，并给出相应的使用例子。

1. 使用requests库进行下载和保存：

requests是一个常用的HTTP库，可以用于发送HTTP请求和处理响应。通过requests库，可以方便地下载网页内容并将其保存到本地。

首先，需要使用pip命令安装requests库：

pip install requests

以下是使用requests库下载和保存网页内容的方法和示例代码：

import requests

def download(url, file_path):
    response = requests.get(url)
    if response.status_code == 200:
        with open(file_path, 'wb') as file:
            file.write(response.content)
            print("网页内容已保存到：", file_path)
    else:
        print("下载失败")

# 示例：下载并保存百度首页
url = "https://www.baidu.com"
file_path = "baidu.html"
download(url, file_path)

运行以上示例代码后，会将百度首页的网页内容保存到当前目录下的baidu.html文件中。

2. 使用urllib模块进行下载和保存：

urllib是Python标准库中的模块，提供了许多处理URL的功能。通过urllib模块，可以实现网页内容的下载和保存。

以下是使用urllib模块下载和保存网页内容的方法和示例代码：

import urllib.request

def download(url, file_path):
    try:
        urllib.request.urlretrieve(url, file_path)
        print("网页内容已保存到：", file_path)
    except Exception as e:
        print("下载失败：", str(e))

# 示例：下载并保存谷歌首页
url = "https://www.google.com"
file_path = "google.html"
download(url, file_path)

运行以上示例代码后，会将谷歌首页的网页内容保存到当前目录下的google.html文件中。

以上两种方法均提供了下载和保存网页内容的功能，可以根据具体需求选择其中一种进行使用。