Pythonsix.moves.urllib.parse库中urljoin()方法实例详解

发布时间：2023-12-23 04:54:26

urljoin()方法用于拼接URL，将一个相对URL和一个基本URL进行拼接，生成一个完整的URL。

使用格式：

urllib.parse.urljoin(base, url, allow_fragments=True)

参数解析：

- base：基本URL，用于拼接

- url：相对URL，要进行拼接的URL

- allow_fragments：指定是否忽略#片段，默认为True

返回值：

返回拼接后的完整URL

使用例子：

from urllib.parse import urljoin

# 拼接相对URL和基本URL
base_url = 'https://www.example.com/'
relative_url = 'path/to/something.html'
absolute_url = urljoin(base_url, relative_url)
print(absolute_url)
# 输出：https://www.example.com/path/to/something.html

# 拼接含有#片段的URL
base_url = 'https://www.example.com/path/to/'
relative_url = 'something.html#section1'
absolute_url = urljoin(base_url, relative_url)
print(absolute_url)
# 输出：https://www.example.com/path/to/something.html#section1

# 忽略#片段
base_url = 'https://www.example.com/path/to/'
relative_url = 'something.html#section1'
absolute_url = urljoin(base_url, relative_url, allow_fragments=False)
print(absolute_url)
# 输出：https://www.example.com/path/to/something.html

# base_url是相对URL时
base_url = '/path/to/'
relative_url = 'something.html'
absolute_url = urljoin(base_url, relative_url)
print(absolute_url)
# 输出：/path/to/something.html

在上面的例子中，首先我们创建了一个基本URL（base_url）和一个相对URL（relative_url）。然后使用urljoin()方法将相对URL和基本URL拼接起来，得到一个完整的URL（absolute_url）。

第一个例子中，base_url是'https://www.example.com/'，relative_url是'path/to/something.html'，拼接后的结果是'https://www.example.com/path/to/something.html'。

第二个例子中，我们拼接了一个含有#片段的URL。base_url是'https://www.example.com/path/to/'，relative_url是'something.html#section1'，拼接后的结果是'https://www.example.com/path/to/something.html#section1'。

第三个例子中，我们使用allow_fragments参数将#片段忽略掉。base_url和relative_url与第二个例子相同，但设置allow_fragments=False，拼接后的结果是'https://www.example.com/path/to/something.html'，#section1被去掉了。

最后一个例子中，我们将base_url设置为相对URL（'/path/to/'），此时拼接后的结果是'/path/to/something.html'。

通过使用urljoin()方法，我们可以方便地拼接URL，生成完整的URL。这在构建爬虫或处理URL时非常有用。