python3爬取各类天气信息

发布时间：2023-05-18 13:09:27

在日常生活中，天气状况是非常必要的信息之一，无论是出行、旅游或是工作等，都需要了解当地的天气情况。那么如何快速获取天气信息呢？那就需要爬虫技术的帮助了。

本篇文章将使用Python3编写一些代码，来爬取各类天气信息。我们将从以下几个方面进行爬取：城市天气、空气质量、24小时天气预报、七日天气预报和日出日落时间等。

1.城市天气信息的爬取

我们可以从一些气象网站，比如中国天气网(http://www.weather.com.cn/)，来爬取各地的天气信息。具体地，我们可以通过向该网站发送GET请求，获取网页源代码中的特定信息（天气状况、温度、风力等）。

代码实现：

#导入模块
import requests
from bs4 import BeautifulSoup

#构造请求头部
headers = {
            'User-Agent' : 'Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

#设置请求的URL
url = 'http://www.weather.com.cn/weather/101010100.shtml'

#发送GET请求
response = requests.get(url, headers=headers)

#解析网页源代码，获取温度和天气状况信息
soup = BeautifulSoup(response.text, 'html.parser')
temp_now = soup.find('p', class_="tem").span.string
weather_now = soup.find('p', class_="wea").string

#将温度和天气状况信息打印出来
print('当前温度：', temp_now)
print('当前天气状况：', weather_now)

我们通过BeautifulSoup模块解析了网页源代码，用find方法获取到了温度和天气状况信息，并将它们打印出来。

2.空气质量信息的爬取

空气质量是一项重要的环境指标，我们可以从中国环境监测总站(http://www.cnemc.cn/)获取实时的空气质量数据。与上一个例子类似，我们同样需要构造请求头部，并向该网站发送GET请求，获取网页源代码中的特定信息（AQI指数、空气质量等级、PM2.5指数、PM10指数等）。

代码实现：

#导入模块
import requests
from bs4 import BeautifulSoup

#构造请求头部
headers = {
            'User-Agent' : 'Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

#设置请求的URL
url = 'http://www.cnemc.cn/'

#发送GET请求
response = requests.get(url, headers=headers)

#解析网页源代码，获取AQI指数和空气质量等级信息
soup = BeautifulSoup(response.text, 'html.parser')
aqi_now = soup.find('div', class_="aqivalue").text
level_now = soup.find('div', class_="quality").text

#将AQI指数和空气质量等级信息打印出来
print('当前AQI指数：', aqi_now)
print('当前空气质量等级：', level_now)

我们同样通过BeautifulSoup模块解析了网页源代码，用find方法获取了AQI指数和空气质量等级信息，并将它们打印出来。

3.24小时天气预报的爬取

中国天气网(http://www.weather.com.cn/)提供的24小时天气预报，可以让我们了解接下来24小时的天气情况。我们同样需要构造请求头部，并向该网站发送GET请求，获取网页源代码中的特定信息（时间、天气状况、温度等）。

代码实现：

#导入模块
import requests
from bs4 import BeautifulSoup

#构造请求头部
headers = {
            'User-Agent' : 'Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

#设置请求的URL
url = 'http://www.weather.com.cn/weather/101010100.shtml#24'

#发送GET请求
response = requests.get(url, headers=headers)

#解析网页源代码，获取24小时预报信息
soup = BeautifulSoup(response.text, 'html.parser')
wea_24 = soup.find_all('li', class_="skyid")

#遍历wea_24，获取每个时间点的具体预报信息，并打印出来
for i in wea_24:
    date = i.find('h1').string
    weather = i.find('p', class_="wea").string
    temperature = i.find('p', class_="tem").find('span').string
    
    print(date, end=" ")
    print(weather, end=" ")
    print(temperature)

同样通过BeautifulSoup模块解析网页源代码，用find_all方法获取到24小时预报信息的每个时间点的具体预报信息，并将它们打印出来。

4.七日天气预报的爬取

中国天气网(http://www.weather.com.cn/)提供的七日天气预报，可以让我们了解接下来一周的天气情况。同样需要构造请求头部，并向该网站发送GET请求，获取网页源代码中的特定信息（日期、天气状况、温度等）。

代码实现：

#导入模块
import requests
from bs4 import BeautifulSoup

#构造请求头部
headers = {
            'User-Agent' : 'Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

#设置请求的URL
url = 'http://www.weather.com.cn/weather/101010100.shtml#7d'

#发送GET请求
response = requests.get(url, headers=headers)

#解析网页源代码，获取七日预报信息
soup = BeautifulSoup(response.text, 'html.parser')
wea_7 = soup.find('div', class_="7d")

#遍历wea_7，获取每个日期的具体预报信息，并将它们打印出来
for i in wea_7('li'):
    date = i.find('h1').string
    weather = i.find('p', class_="wea").string
    temperature = i.find('p', class_="tem").find('span').string
    wind = i.find('p', class_="win").find('span').string
    
    print(date, end=" ")
    print(weather, end=" ")
    print(temperature, end=" ")
    print(wind)

同样通过BeautifulSoup模块解析网页源代码，用find方法获取到七日预报信息的每个日期的具体预报信息，并将它们打印出来。

5.日出日落时间的爬取

最后，我们从中国天气网(http://www.weather.com.cn/)爬取当地的日出日落时间信息。同样需要构造请求头部，并向该网站发送GET请求，获取网页源代码中的特定信息（日出时间、日落时间）。

代码实现：

`python

#导入模块

import requests

from bs4 import BeautifulSoup

#构造请求头部

headers = {

'User-Agent' : 'Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

#设置请求的URL

url = 'http://www