1200字范文 > python爬虫获取历史天气信息

python爬虫获取历史天气信息

时间：2021-04-13 01:09:14

相关推荐

python爬虫获取历史天气信息

想要获得一个城市的历史天气，可以在天气后报网站上查询获得

如果要通过大量历史天气数据做分析，可以通过爬虫的方式获得。

如，我们要查询北京9月的天气汇总。可以看到网站界面如图所示

要爬取这个列表中的数据，首先设置headers，headers是解决requests请求反爬的方法之一，相当于我们进去这个网页的服务器本身，假装自己本身在爬取数据。对反爬虫网页，可以设置一些headers信息，模拟成浏览器取访问网站。

headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Encoding': 'gzip, deflate, compress','Accept-Language': 'en-us;q=0.5,en;q=0.3','Cache-Control': 'max-age=0','Connection': 'keep-alive','User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/0101 Firefox/22.0'}

说明：

（1）根据网页网址信息，可以替换其中的城市名称及年月信息爬取需要的天气数据

（2）beautiful soup 是Python的一个HTML或XML的解析库。他提供一个简单的、Python式的函数来处理导航、搜索、修改分析数等功能。它是一个工具箱，通过解析文档为用户提供需要抓取的数据。

beautiful soup 自动将输入文档转化为Unicode编码，输出文档转化为utf-8编码，不需要考虑编码方式。

（3）生成天气数据表，这里把最高温和最低温区分开来。

def GetWeather(year,month,city):url = '/lishi/'+city+'/month/'+year+month+'.html' htmlsingle = requests.get(url, headers=headers)t=htmlsingle.text.encode(htmlsingle.encoding)soup=BeautifulSoup(t,'lxml') TextList = []tagh3 = soup.find_all('td')del tagh3[:4]for each in tagh3:TextList.append(each.text)TextList = [re.sub('[\n\r ]','',v) for v in TextList]WeatherDf = pd.DataFrame(np.array(TextList).reshape(int(len(TextList) / 4),4))WeatherDf.columns = ['date','weather','high_low','wind']low = []high = []for i in range(0,len(WeatherDf)):a = re.search('/', WeatherDf.high_low[i]).span()high.append(WeatherDf.high_low[i][:a[0]].replace("℃",""))low.append(WeatherDf.high_low[i][a[1]:].replace("℃",""))WeatherDf['high'] = highWeatherDf['low'] = lowWeatherDf = WeatherDf.loc[:,['date','weather','high','low','wind']]return WeatherDf

爬取到的北京市9月的历史天气数据为

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。