Python的urllib库和urllib2库的区别及使用方法介绍

发布时间：2024-01-17 06:52:10

urllib和urllib2是Python中用于处理URL的库。它们提供了许多方法，可用于发送HTTP请求、处理响应、处理URL编码等操作。下面是urllib和urllib2库的区别及使用方法的介绍。

一、urllib库

1. 区别：

区分urllib和urllib2的原因在于Python版本的差异。在Python 2.x中，urllib被分为多个模块，包括urllib、urllib2、urllib3等；而在Python 3.x中，这些模块被整合到urllib中。

2. 使用方法：

urllib库提供了4个模块，分别是urllib.request、urllib.error、urllib.parse和urllib.robotparser。下面介绍它们的使用方法：

a. urllib.request模块：

urllib.request模块用于发送HTTP请求和接收响应。它提供了一系列的函数，如urlopen()、urlretrieve()等。其中，urlopen()函数用于打开一个URL地址并获取其内容，urlretrieve()函数用于下载文件。

示例1：使用urlopen()函数发送GET请求

  from urllib.request import urlopen

  response = urlopen('http://www.example.com')
  html = response.read()
  print(html)

示例2：使用urlretrieve()函数下载文件

  from urllib.request import urlretrieve

  urlretrieve('http://www.example.com/file.txt', 'file.txt')

b. urllib.error模块：

urllib.error模块用于处理HTTP请求过程中的异常。它提供了一些异常类，如URLError、HTTPError等。

示例：捕获并处理异常

  from urllib.request import urlopen
  from urllib.error import URLError

  try:
      response = urlopen('http://www.example.com')
      html = response.read()
      print(html)
  except URLError as e:
      print('An error occurred:', e.reason)

c. urllib.parse模块：

urllib.parse模块用于处理URL编码、解析URL等操作。它提供了一些函数，如quote()、unquote()、urlparse()等。其中，quote()函数用于对URL进行编码，unquote()函数用于对URL进行解码，urlparse()函数用于解析URL。

示例：对URL进行编码和解码

  from urllib.parse import quote, unquote

  encoded_url = quote('http://www.example.com/search?keyword=Python')
  print(encoded_url)  # 'http%3A//www.example.com/search%3Fkeyword%3DPython'

  decoded_url = unquote('http%3A//www.example.com/search%3Fkeyword%3DPython')
  print(decoded_url)  # 'http://www.example.com/search?keyword=Python'

d. urllib.robotparser模块：

urllib.robotparser模块用于处理robots.txt文件，该文件用于指示网络爬虫哪些页面可以抓取。它提供了一个RobotFileParser类，可以用于解析robots.txt文件并判断某个URL是否可以抓取。

示例：判断URL是否可以抓取

  from urllib.robotparser import RobotFileParser

  rp = RobotFileParser()
  rp.set_url('http://www.example.com/robots.txt')
  rp.read()

  if rp.can_fetch('*', 'http://www.example.com/page.html'):
      print('URL can be fetched')
  else:
      print('URL cannot be fetched')

二、urllib2库

1. 区别：

urllib2是Python 2.x版本中用于处理URL的库。在Python 3.x中，urlli2库被合并到urllib库中，因此在Python 3.x中无法使用urllib2库。

2. 使用方法：

urllib2库提供了一些类和函数，用于发送HTTP请求、处理响应等操作。

a. 使用urllib2.urlopen()函数发送HTTP请求和接收响应：

示例：发送GET请求

  import urllib2

  response = urllib2.urlopen('http://www.example.com')
  html = response.read()
  print(html)

示例：发送POST请求

  import urllib2

  data = 'key1=value1&key2=value2'
  request = urllib2.Request('http://www.example.com', data)
  response = urllib2.urlopen(request)
  html = response.read()
  print(html)

b. 使用urllib2.Request类构建HTTP请求：

示例：构建自定义的HTTP请求

  import urllib2

  url = 'http://www.example.com'
  headers = {'User-Agent': 'Mozilla/5.0'}
  request = urllib2.Request(url, headers=headers)
  
  response = urllib2.urlopen(request)
  html = response.read()
  print(html)

c. 使用异常处理处理HTTP请求过程中的异常：

示例：捕获并处理异常

  import urllib2
  from urllib2 import URLError

  try:
      response = urllib2.urlopen('http://www.example.com')
      html = response.read()
      print(html)
  except URLError as e:
      print('An error occurred:', e.reason)

综上所述，urllib库和urllib2库在Python中用于处理URL的操作。它们提供了一些类和函数，可用于发送HTTP请求、处理响应、处理URL编码等操作。具体使用哪个库取决于Python版本的不同，Python 2.x版本使用urllib2库，Python 3.x版本使用urllib库。使用时，可以根据需求选择相应的模块和函数进行操作。