Python中获取当前网页链接的方法

发布时间：2023-12-16 08:46:42

在Python中，可以使用requests库和beautifulsoup库来获取当前网页链接。下面将介绍获取当前网页链接的方法，并提供使用例子。

1. 使用requests库发送网络请求，获取网页内容。

import requests

url = "https://example.com"
response = requests.get(url)
html_content = response.text

2. 使用beautifulsoup库解析网页内容，获取所有链接。

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_content, "html.parser")
links = soup.find_all("a")

3. 提取链接中的href属性，获取当前网页的链接列表。

current_page_links = []
for link in links:
    href = link.get("href")
    current_page_links.append(href)

4. 打印当前网页的链接列表。

for link in current_page_links:
    print(link)

通过上述步骤，可以获取当前网页的链接列表。完整的代码示例如下：

import requests
from bs4 import BeautifulSoup

url = "https://example.com"
response = requests.get(url)
html_content = response.text

soup = BeautifulSoup(html_content, "html.parser")
links = soup.find_all("a")

current_page_links = []
for link in links:
    href = link.get("href")
    current_page_links.append(href)

for link in current_page_links:
    print(link)

运行以上代码，将输出当前网页的链接列表。

注意：以上方法只能获取当前网页的链接，如果需要获取网页中嵌套的子链接，需要额外的处理。通过requests库发送网络请求，获取子链接的网页内容，然后再使用beautifulsoup库解析子链接的网页内容，获取子链接的链接。这样循环处理，直到获取目标链接为止。

获取当前网页链接的方法通过requests库和beautifulsoup库的配合，简单有效。可以用于爬取网页数据、数据分析等应用场景。