1200字范文,内容丰富有趣,写作的好帮手!
1200字范文 > Python原生爬虫 --- 50行代码爬取某直播网站主播名和人气值

Python原生爬虫 --- 50行代码爬取某直播网站主播名和人气值

时间:2022-01-26 14:51:51

相关推荐

Python原生爬虫 --- 50行代码爬取某直播网站主播名和人气值

1. 爬虫前奏

明确目的,例如,爬取直播网站“某猫”英雄联盟版块主播名字和人气。找到数据对应的网页,分析网页的结构找到数据所在标签位置。

2. 具体方法

模拟HTTP请求,向服务器发送请求,获取到服务器返回给我们的HTML,用正则表达式提取需要的数据。

3. 代码示例

完整代码如下,50行代码搞定python原生爬虫。

import re #正则表达式模块from urllib import request #通过request对象获取html页面class Spider():url = 'https://www.panda.tv/cate/lol'root_pattern = '<div class="video-info">([\s\S]*?)</div>' #()表示只提取定位标签中间内容name_pattern = '</i>([\s\S]*?)</span>'number_pattern = '<span class="video-number">([\s\S]*?)</span>'def __fetch_content(self): #私有方法,获取html页面r = request.urlopen(Spider.url) #在实例方法里读取类变量urlhtmls = r.read() #字节码htmls = str(htmls,encoding = 'utf-8') #将字节码转为字符串print(htmls)return htmlsdef __analysis(self,htmls):#分析htmls文本,通过正则表达式提取html文本中的主播名和人气值root_html = re.findall(Spider.root_pattern,htmls)anchors = []for html in root_html:name = re.findall(Spider.name_pattern,html)number = re.findall(Spider.number_pattern,html)anchor = {'name':name,'number':number}anchors.append(anchor)print(anchors)return anchorsdef __refine(self,anchors): #精炼数据,剔除文本中的空格和换行符等内容,规范成易读的数据targets = []for target_list in anchors:name = target_list['name'][0].strip()number = target_list['number'][0]one_people = {'name':name,'number':number}targets.append(one_people)print(targets); return targetsdef __sort(self,anchors): #对精炼后的数据按主播人气值进行排序anchors = sorted(anchors,key = self.__sort_seed,reverse = True)return anchorsdef __sort_seed(self,anchor):#设置排序规则r = re.findall('\d*',anchor['number'])number = float(r[0])if '万' in anchor['number']:number *= 10000return numberdef __show(self,anchors): #展示最终爬取的数据for rank in range(0,len(anchors)):print('rank ' + str(rank+1)+ ':' +anchors[rank]['name'] + ' ' + anchors[rank]['number'])def go(self): #公开方法,go方法是Spider的入口方法htmls = self.__fetch_content()anchors = self.__analysis(htmls)anchors = self.__refine(anchors)anchors = self.__sort(anchors)self.__show(anchors)spider = Spider()spider.go()

以下是print()到控制台的数据:

print(htmls)

data-pdt-ele="1">英雄联盟</a></div></div></li><li class="video-list-item video-no-tag video-no-cate " data-pdt-block="sd1-109" data-id="26657"><a target="_blank" href="26657" class="video-list-item-wrap" data-pdt-ele="0" data-id="26657" ><div class="video-cover "><img class="video-img video-img-lazy" data-original="https://i.h2.pdim.gs/90/e9c7aaa62412bb248c6829b04c56a3c7/w338/h190.jpg" alt="【吸血鬼教学各种细节】"><div class="video-overlay"></div><div class="video-play"></div><div class="lottery-icon-list"></div></div><div class="video-info"><span class="video-title" title="【吸血鬼教学各种细节】">【吸血鬼教学各种细节】</span><span class="video-nickname" title="有毒i吸血鬼"><i class="icon-hostlevel icon-hostlevel-11" data-level="11"></i>有毒i吸血鬼 </span><span class="video-number">851</span><span class="video-station-info"><i class="video-station-num">18人</i></span></div></a><div class="video-label"><div class="video-label-content"><a class="video-label-item label-color-0" href="/cate/lol"data-pdt-ele="1">英雄联盟</a></div></div></li><li class="video-list-item video-no-tag video-no-cate " data-pdt-block="sd1-110" data-id="2276191"><a target="_blank" href="2276191" class="video-list-item-wrap" data-pdt-ele="0" data-id="2276191" ><div class="video-cover "><img class="video-img video-img-lazy" data-original="https://i.h2.pdim.gs/90/6f46def671c207a8e4750a3c2ad2d092/w338/h190.jpg" alt="求订阅,artifact还可以"><div class="video-overlay"></div><div class="video-play"></div><div class="lottery-icon-list"></div></div><div class="video-info"><span class="video-title" title="求订阅,artifact还可以">求订阅,artifact还可以</span><span class="video-nickname" title="高调的火星人"><i class="icon-hostlevel icon-hostlevel-0" data-level="0"></i>高调的火星人 </span><span class="video-number">846</span><span class="video-station-info"><i class="video-station-num">0人</i></span></div></a><div class="video-label"><div class="video-label-content"><a class="video-label-item label-color-0" href="/cate/lol"data-pdt-ele="1">英雄联盟</a></div></div></li><li class="video-list-item video-no-tag video-no-cate " data-pdt-block="sd1-111" data-id="1193989"><a target="_blank" href="1193989" class="video-list-item-wrap" data-pdt-ele="0" data-id="1193989" ><div class="video-cover "><img class="video-img video-img-lazy" data-original="https://i.h2.pdim.gs/90/fa49782338c10a678915ecfb07891119/w338/h190.jpg" alt="刀妹专场这个中单刀妹最无敌不接受反驳"><div class="video-overlay"></div><div class="video-play"></div><div class="lottery-icon-list"></div></div><div class="video-info"><span class="video-title" title="刀妹专场这个中单刀妹最无敌不接受反驳">刀妹专场这个中单刀妹最无敌不接受反驳</span><span class="video-nickname" title="爱唱歌的小南丶"><i class="icon-hostlevel icon-hostlevel-1" data-level="1"></i>爱唱歌的小南丶 </span><span class="video-number">825</span><span class="video-station-info"><i class="video-station-num">3人</i></span></div></a><div class="video-label"><div class="video-label-content"><a class="video-label-item label-color-0" href="/cate/lol"data-pdt-ele="1">英雄联盟</a></div></div></li><li class="video-list-item video-no-tag video-no-cate " data-pdt-block="sd1-112" data-id="796585"><a target="_blank" href="796585" class="video-list-item-wrap" data-pdt-ele="0" data-id="796585" ><div class="video-cover "><img class="video-img video-img-lazy" data-original="https://i.h2.pdim.gs/90/3ca5b2f3cbeb62b9a443fa73e4186d44/w338/h190.jpg" alt="青铜皇帝在线锤号!一礼炮=锤号➕房管"><div class="video-overlay"></div><div class="video-play"></div><div class="lottery-icon-list"></div></div><div class="video-info"><span class="video-title" title="青铜皇帝在线锤号!一礼炮=锤号➕房管">青铜皇帝在线锤号!一礼炮=锤号➕房管</span><span class="video-nickname" title="熊猫尼古拉斯胖虎"><i class="icon-hostlevel icon-hostlevel-7" data-level="7"></i>熊猫尼古拉斯胖虎 </span><span class="video-number">820</span><span class="video-station-info"><i class="video-station-num">12人</i></span></div></a><div class="video-label"><div class="video-label-content"><a class="video-label-item label-color-0" href="/cate/lol"data-pdt-ele="1">英雄联盟</a></div></div></li><li class="video-list-item video-no-tag video-no-cate " data-pdt-block="sd1-113" data-id="2274142"><a target="_blank" href="2274142" class="video-list-item-wrap" data-pdt-ele="0" data-id="2274142" ><div class="video-cover "><img class="video-img video-img-lazy" data-original="https://i.h2.pdim.gs/90/fc998e1181f8ef099ca6af8ebb12e067/w338/h190.jpg" alt="大佬们助我升级鸭QAQ,求订阅"><div class="video-overlay"></div><div class="video-play"></div><div class="lottery-icon-list"><span class="lottery-icon meepo-icon"><img src="https://i.h2.pdim.gs/38d2434036db929f9564512a6c865d02.png"></span></div></div><div class="video-info"><span class="video-title" title="大佬们助我升级鸭QAQ,求订阅">大佬们助我升级鸭QAQ,求订阅</span><span class="video-nickname" title="爱吃板栗123"><i class="icon-hostlevel icon-hostlevel-0" data-level="0"></i>爱吃板栗123 </span><span class="video-number">817</span><span class="video-station-info"><i class="video-station-num">67人</i></span></div></a><div class="video-label"><div class="video-label-content"><a class="video-label-item label-color-0" href="/cate/lol"data-pdt-ele="1">英雄联盟</a></div></div></li><li class="video-list-item video-no-tag video-no-cate " data-pdt-block="sd1-114" data-id="2205730"><a target="_blank" href="2205730" class="video-list-item-wrap" data-pdt-ele="0" data-id="2205730" ><div class="video-cover "><img class="video-img video-img-lazy" data-original="https://i.h2.pdim.gs/90/092783ef2fe6933a28b5b10c069dee01/w338/h190.jpg" alt="你的梦想~我来完成~"><div class="video-overlay"></div><div class="video-play"></div><div class="lottery-icon-list"><span class="lottery-icon meepo-icon"><img src="https://i.h2.pdim.gs/38d2434036db929f9564512a6c865d02.png"></span></div></div><div class="video-info"><span class="video-title" title="你的梦想~我来完成~">你的梦想~我来完成~</span><span class="video-nickname" title="熊猫丶老白白"><i class="icon-hostlevel icon-hostlevel-2" data-level="2"></i>熊猫丶老白白 </span><span class="video-number">817</span><span class="video-station-info"><i class="video-station-num">6人</i></span></div></a><div class="video-label"><div class="video-label-content"><a class="video-label-item label-color-0" href="/cate/lol"data-pdt-ele="1">英雄联盟</a><a class="video-label-item label-color-4" href="/label/xshshl" data-pdt-ele="2">新手上路</a></div></div></li><li class="video-list-item video-no-tag video-no-cate " data-pdt-block="sd1-115" data-id="962533"><a target="_blank" href="962533" class="video-list-item-wrap" data-pdt-ele="0" data-id="962533" ><div class="video-cover "><img class="video-img video-img-lazy" data-original="https://i.h2.pdim.gs/90/fffeb6106fa0bffba3e88c3f50dbdbd1/w338/h190.jpg" alt="落羽:韩服励志冲王者"><div class="video-overlay"></div><div class="video-play"></div><div class="lottery-icon-list"></div></div><div class="video-info"><span class="video-title" title="落羽:韩服励志冲王者">落羽:韩服励志冲王者</span><span class="video-nickname" title="落羽李青"><i class="icon-hostlevel icon-hostlevel-2" data-level="2"></i>落羽李青</span><span class="video-number">813</span><span class="video-station-info"><i class="video-station-num">9人</i></span></div></a><div class="video-label"><div class="video-label-content"><a class="video-label-item label-color-0" href="/cate/lol"data-pdt-ele="1">英雄联盟</a><a class="video-label-item label-color-4" href="/label/xshshl" data-pdt-ele="2">新手上路</a></div></div></li><li class="video-list-item video-no-tag video-no-cate " data-pdt-block="sd1-116" data-id="2249384"><a target="_blank" href="2249384" class="video-list-item-wrap" data-pdt-ele="0" data-id="2249384" ><div class="video-cover "><img class="video-img video-img-lazy" data-original="https://i.h2.pdim.gs/90/0243211a018693e7eea6f4baea1f0200/w338/h190.jpg" alt="来场酣畅淋漓的战斗"><div class="video-overlay"></div><div class="video-play"></div><div class="lottery-icon-list"></div></div><div class="video-info"><span class="video-title" title="来场酣畅淋漓的战斗">来场酣畅淋漓的战斗</span><span class="video-nickname" title="是欢欢呀丶"><i class="icon-hostlevel icon-hostlevel-0" data-level="0"></i>是欢欢呀丶 </span><span class="video-number">808</span><span class="video-station-info"><i class="video-station-num">0人</i></span></div></a><div class="video-label"><div class="video-label-content"><a class="video-label-item label-color-0" href="/cate/lol"data-pdt-ele="1">英雄联盟</a></div></div></li><li class="video-list-item video-no-tag video-no-cate " data-pdt-block="sd1-117" data-id="2241576"><a target="_blank" href="2241576" class="video-list-item-wrap" data-pdt-ele="0" data-id="2241576" ><div class="video-cover "><img class="video-img video-img-lazy" data-original="https://i.h2.pdim.gs/90/eb6ba45bcc47bedca0e1430009a5233e/w338/h190.jpg" alt="面包强:佛系直播第一天"><div class="video-overlay"></div><div class="video-play"></div><div class="lottery-icon-list"></div></div><div class="video-info"><span class="video-title" title="面包强:佛系直播第一天">面包强:佛系直播第一天</span><span class="video-nickname" title="面包强"><i class="icon-hostlevel icon-hostlevel-0" data-level="0"></i>面包强 </span><span class="video-number">807</span><span class="video-station-info"><i class="video-station-num">4人</i></span></div></a><div class="video-label"><div class="video-label-content"><a class="video-label-item label-color-0" href="/cate/lol"data-pdt-ele="1">英雄联盟</a></div></div></li></ul></div><div id="pages-container"></div><div class="filter-list-empty"><img src="https://i.h2.pdim.gs/7deb952ddd943762e9591342e18777a6.png"><p>该条件下还没有开播的直播间</p></div></div></div><!--[if IE]><script>var protocol = location.protocol;window.__xdomainSlaves = {};window.__xdomainSlaves[protocol + '//xgame.gate.panda.tv'] = '/proxy.html';window.__xdomainSlaves[protocol + '//grank.panda.tv'] = '/proxy.html';</script><![endif]--><!-- IE 10- ajax 跨域方案 --><!--[if IE]><script src="https://s.h2.pdim.gs/static/f48d7cc521239cb0/xdomain.js"></script><script>(function() {var protocol = location.protocol;var slaves = {};slaves[protocol + '//u.panda.tv'] = '/proxy.html';slaves[protocol + '//roll.panda.tv'] = '/proxy.html';slaves[protocol + '//mall.gate.panda.tv'] = '/proxy.html';slaves[protocol + '//bag.gate.panda.tv'] = '/proxy.html';slaves[protocol + '//verify.panda.tv'] = '/proxy.html';slaves[protocol + '//api.feedback.panda.tv'] = '/proxy.html';slaves[protocol + '//api.report.panda.tv'] = '/proxy.html';slaves[protocol + '//ivern.gate.panda.tv'] = '/proxy.html';slaves[protocol + '//sharingan.gate.panda.tv'] = '/proxy.html';slaves[protocol + '//message.panda.tv'] = '/proxy.html';slaves[protocol + '//dakki.gate.panda.tv'] = '/proxy.html';slaves[protocol + '//device.gate.panda.tv'] = '/proxy.html';if (window.__xdomainSlaves) {for (var o in window.__xdomainSlaves) {if (window.__xdomainSlaves.hasOwnProperty(o) && window.__xdomainSlaves[o]) {slaves[o] = window.__xdomainSlaves[o];}}}xdomain.slaves(slaves);}());</script><![endif]--><script src="https://s.h2.pdim.gs/static/4f3413d6aa6afffe.js"></script><script src="https://s.h2.pdim.gs/static/88c2dfc29315a757/perfect-scrollbar-1.3.0.1.min.js"></script><script src="https://s.h2.pdim.gs/static/3c3a90b8afb5121f/ruc_v2.2.3.js"></script><!-- IE 9+ webkit firefox ajax 跨域方案 --><!--[if !IE]><!--><script>(function(window, $){$.ajaxPrefilter(function (options) {var link = document.createElement('a');link.setAttribute('href', options.url);if (link.hostname !== window.location.hostname && /(^|\.)panda(\.tv|tv\.com)$/.test(link.hostname)) {options.xhrFields = {withCredentials: true};options.crossDomain = true;}});var token = document.cookie.match(/I=r%3D\d+%26t%3D(\w+)/);token = token && token[1] || '';window._config_usertoken = token;$.ajaxSetup({data: {'token': token},cache: false,timeout: 6000});})(window, jQuery)</script><!--<![endif]--><script>window._config_env = "online";</script><script src="/cmstatic/global-config.js"></script><script src="https://s.h2.pdim.gs/static/6a912600c05df281.js"></script><script src="https://s.h2.pdim.gs/static/0d12f642ce7c9520.js"></script><script>var TOKEN = '';TOKEN && jQuery && jQuery.ajaxSetup({'data': {token: TOKEN}, cache: false, 'timeout': 6000});</script><script src="https://s.h2.pdim.gs/static/6577430c53548433.js"></script><script src="//www.panda.tv/cmstatic/psbar-config.js"></script><script src="https://s.h2.pdim.gs/static/978d3d6dc9fde486/PSbar_v2.2.6.js"></script><script src="https://s.h2.pdim.gs/staticdir/df83121c-1/fleet_1.0.11.js"></script><script src="https://s.h2.pdim.gs/staticdir/b2d4f8e3-1/perfectdatetimepicker/jquery.datetimepicker.js"></script><!-- biz js start --><script src="https://s.h2.pdim.gs/static/f3c547fe9602b509.js"></script><!-- biz js end --><script>window._config_webmonit_app = "pandaweb";</script><script src="https://s.h2.pdim.gs/static/4347a5cfa17f7e36/panda-monitor-3.1.4.js"></script><script>var _hmt = _hmt || [];(function() {var hm = document.createElement("script");hm.src = "///hm.js?204071a8b1d0b2a04c782c44b88eb996";var s = document.getElementsByTagName("script")[0];s.parentNode.insertBefore(hm, s);})();</script><script>(function() {window._smReadyFuncs = [];window.SMSdk = {ready: function(fn) {fn && _smReadyFuncs.push(fn);}};var sm = document.createElement("script");sm.src = ("https:" === document.location.protocol ? "https://s.h2.pdim.gs" : "http://s8.pdim.gs") + "/static/b9bfca82e08dcf1b.js";var s = document.getElementsByTagName("script")[0];s.parentNode.insertBefore(sm, s);SMSdk.ready(function() {var pdft = document.cookie.match(/pdft=(\w+)/);pdft && pdft[1] || $.ajax({type: "POST",url: "https://device.gate.panda.tv/pdft",dataType: "json",data: {vendor: 1,os: "web",data: JSON.stringify(SMSdk.getDeviceData() || {})},xhrFields: {withCredentials: true}}).then(function(res) {if(res.errno == 0 && res.data) {SMSdk.setDeviceId(res.data.deviceId);}});});})();</script><!-- pdtsdk start--><script src="https://s.h2.pdim.gs/static/611aa57a9ebe1431/pdt-sdk-v0.3.2.js"></script><!-- pdtsdk end--><script class="zhiCustomBtn" id="zhichiScript" src="/chat/frame/js/entrance.js?sysNum=b6f88b708ca64d05874795c8d4e3b4d4" charset="utf-8"></script><style type="text/css">#zhichiBtnBox {display: none!important;}#ZCChatFrame {left: 240px;}</style><script>var zhiManager = (getzhiSDKInstance());zhiManager.set('customBtn', 'true');</script></body></html><!--5.132.2.21675-->

print(anchors)

[{'name': ['\n 贾克虎丶虎神 ', '\n<i class="video-station-rank">| 排名19</i>\n'], 'number': ['152.0万']}, {'name':['\n 小师弟180', '\n'], 'number': ['58.1万']}, {'name': ['\n 君克解说 ', '\n'], 'number': ['46.7万']}, {'name': ['\n 熊猫Tv丶狮子汪 ', '\n'], 'number': ['7.5万']}, {'name': ['\n小丸仔卡特', '\n'], 'number': ['3234']}, {'name': ['\nLOL丶摇摆哥', '\n'], 'number': ['30.9万']}, {'name': ['\n S8全球总决赛', '\n '], 'number': ['21.8万']}, {'name': ['\n左手吸血鬼QAQ', '\n '], 'number': ['6.6万']}, {'name': ['\n 柚子ob', '\n '], 'number': ['4.1万']}, {'name': ['\n 请叫我梦哥哥', '\n '], 'number': ['3.8万']}, {'name': ['\n lol稳贱骨炼金', '\n '], 'number': ['3.4万']}, {'name': ['\n 朱晓飞五五五五', '\n '], 'number': ['2.5万']}, {'name': ['\n 小沁想吹空调吖', '\n '], 'number': ['2.2万']}, {'name': ['\n 宇宙大表哥', '\n '], 'number': ['1.9万']}, {'name': ['\n 又酱阿', '\n '], 'number': ['1.7万']}, {'name': ['\n 狼王沃李克', '\n '], 'number': ['1.5万']}, {'name': ['\n 熊猫伏念', '\n '], 'number': ['1.4万']}, {'name': ['\n 晓庄豹女', '\n '], 'number': ['8467']}, {'name': ['\n 猴王一心', '\n'],'number': ['8084']}, {'name': ['\n听白呀丶', '\n'], 'number': ['7833']}, {'name': ['\n 抗寒使者', '\n'], 'number': ['6906']}, {'name': ['\n BubbleBubble', '\n'], 'number': ['6802']}, {'name': ['\n 顺顺套路王 ', '\n'], 'number': ['6009']}, {'name': ['\n 可爱的苏韵儿 ', '\n'], 'number': ['5859']}, {'name': ['\n 一个很C的稻草人 ', '\n'], 'number': ['5809']}, {'name':['\n 小白菜嗷呜 ', '\n'], 'number': ['5573']}, {'name': ['\n 魔剑神无敌', '\n'], 'number': ['5168']}, {'name': ['\nPanda樱皇', '\n'], 'number': ['5036']}, {'name': ['\n 梦里来的小亦菲丶', '\n'], 'number': ['4815']}, {'name': ['\n 陈大G', '\n '], 'number': ['4645']}, {'name': ['\n 一情书一', '\n '], 'number': ['4584']}, {'name': ['\n杀鸡游戏俱乐部', '\n '], 'number': ['4581']}, {'name': ['\n 琳琪baby', '\n '], 'number': ['4505']}, {'name': ['\n这个赵信有丶C', '\n '], 'number': ['3844']}, {'name': ['\n 瓜神z', '\n '], 'number': ['3595']}, {'name': ['\n小马哥玩盖伦', '\n '], 'number': ['3424']}, {'name': ['\n 兰晨丶', '\n '], 'number': ['3420']}, {'name': ['\n 冰雪丶狐狸', '\n'], 'number': ['3233']}, {'name': ['\n大学长丶', '\n'], 'number': ['3104']}, {'name': ['\n 小明伊芙琳', '\n'], 'number': ['3022']}, {'name': ['\n 熊猫TV丶萌阿琦i', '\n'], 'number': ['2810']}, {'name': ['\n 阿佑any ', '\n'], 'number': ['2707']}, {'name': ['\n 江西丶社会强', '\n'], 'number': ['2703']}, {'name': ['\n 熊猫TVsao马', '\n'], 'number': ['2644']}, {'name': ['\n 熊猫TV丶油菜花1 ', '\n'], 'number': ['2537']}, {'name': ['\n 熊猫TV丶小老鼠 ', '\n'], 'number': ['2249']}, {'name': ['\n 熊猫盖伦王', '\n'], 'number': ['2231']}, {'name': ['\n 七哥卡牌丶', '\n'], 'number': ['2210']}, {'name': ['\n 阿毛Fit', '\n '], 'number': ['2196']}, {'name': ['\n 暖暖猫神', '\n'], 'number': ['2188']}, {'name': ['\n 嗜血馒头', '\n '], 'number': ['2187']}, {'name': ['\n小兔儿甜', '\n '], 'number': ['2157']}, {'name': ['\n 无V情', '\n '], 'number': ['2065']}, {'name': ['\n LPL熊猫官方直播', '\n '], 'number': ['2049']}, {'name': ['\n金克喵的猫珥朵丶', '\n '], 'number': ['2044']}, {'name': ['\n头型睡炸的33', '\n '], 'number': ['2025']}, {'name': ['\n Dedizzz', '\n '], 'number': ['2002']}, {'name': ['\n 大表哥王者蛇女', '\n '], 'number': ['1998']}, {'name': ['\n 自闭症晚期患者z', '\n '], 'number': ['1930']}, {'name': ['\n想打职业的XMxx', '\n'], 'number': ['1920']}, {'name': ['\n 大洋洋y', '\n'], 'number': ['1918']}, {'name': ['\n准时不迟到的宁神', '\n'], 'number': ['1894']}, {'name': ['\n泰国隆z', '\n'], 'number': ['1882']}, {'name': ['\n 这个人帅到没朋友', '\n'], 'number': ['1872']}, {'name': ['\n 有个辅助叫瓜瓜a', '\n'], 'number': ['1858']}, {'name': ['\n 迟到不准时的岛屿', '\n'], 'number': ['1844']}, {'name': ['\n 黑夜剑魔', '\n'], 'number': ['1834']},{'name': ['\n 布依灬卡特', '\n'], 'number': ['1824']}, {'name': ['\n 希希天使S ', '\n'], 'number': ['1823']}, {'name': ['\n 梁老师的作死大头 ', '\n'], 'number': ['1800']}, {'name': ['\n New恩赐解脱 ', '\n'], 'number': ['1798']}, {'name': ['\n 大雄d啊', '\n'], 'number': ['1791']}, {'name': ['\n徐牛牛Zzz', '\n '], 'number': ['1762']}, {'name': ['\n 我的傻喵', '\n'], 'number': ['1753']}, {'name': ['\n我是巴卫酱', '\n'], 'number': ['1740']}, {'name': ['\n 啊一丶Ay1zzz', '\n'], 'number': ['1724']}, {'name': ['\n青蛙OB', '\n'], 'number': ['1680']}, {'name': ['\n叫我EVEN好了', '\n '], 'number': ['1655']}, {'name': ['\n 热不息恶木荫丶', '\n'], 'number': ['1583']}, {'name': ['\n 叶芯丶', '\n'], 'number': ['1555']}, {'name': ['\n 暴躁小十一', '\n '], 'number': ['1551']}, {'name': ['\n 冷面寒枪人马神', '\n '], 'number': ['1526']}, {'name': ['\n 浩哥拉风依旧', '\n'], 'number': ['1515']}, {'name': ['\n 熊猫直播丶小沣酱', '\n'], 'number': ['1490']}, {'name': ['\n 夢遊王者丶画小雯', '\n'], 'number': ['1481']}, {'name': ['\n 长路漫漫剑圣作伴', '\n'], 'number': ['1449']}, {'name': ['\n 妖娆的考拉', '\n'], 'number': ['1447']}, {'name': ['\n 伽耳伽耳', '\n'], 'number': ['1415']}, {'name': ['\n V神参上', '\n'], 'number': ['1379']},{'name': ['\n 可爱小仙女丶 ', '\n'], 'number': ['1379']}, {'name': ['\n 杨洋洋洋i', '\n'], 'number': ['1338']}, {'name': ['\n 可口可乐的克克 ', '\n'], 'number': ['1336']}, {'name': ['\n 苏璞呀丶', '\n'], 'number': ['1305']}, {'name': ['\n超级无敌阿东锅 ', '\n'], 'number': ['1244']}, {'name': ['\n EnnnMing', '\n'], 'number': ['1237']}, {'name': ['\n我是萌宝', '\n '], 'number': ['1204']}, {'name': ['\n Panda丶浅唱小生', '\n '], 'number': ['1179']}, {'name': ['\n 凌峰OwO', '\n '], 'number': ['1168']}, {'name': ['\n 请叫我越塔怪', '\n'], 'number': ['1161']}, {'name': ['\n 熊猫tv阿铖', '\n '], 'number': ['1124']}, {'name': ['\n 学习学习在学习', '\n '], 'number': ['1098']}, {'name': ['\n小允呀', '\n '], 'number': ['1074']}, {'name': ['\n叫我王者飞啦', '\n '], 'number': ['1071']}, {'name': ['\n Panda丶冰冰', '\n '], 'number': ['1043']}, {'name': ['\n 水壶0417', '\n'], 'number': ['1002']}, {'name': ['\n 熊猫TV丶花伦', '\n'],'number': ['893']}, {'name': ['\n 小段啊丶', '\n'], 'number': ['885']}, {'name': ['\n 不懂老师yc', '\n'], 'number': ['862']}, {'name': ['\n 有毒i吸血鬼', '\n'], 'number': ['851']}, {'name': ['\n 高调的火星人 ', '\n'], 'number': ['846']}, {'name': ['\n 爱唱歌的小南丶 ', '\n'], 'number': ['825']}, {'name':['\n 熊猫尼古拉斯胖虎 ', '\n'], 'number': ['820']}, {'name': ['\n 爱吃板栗123 ', '\n'], 'number': ['817']}, {'name': ['\n 熊猫丶老白白', '\n'], 'number': ['817']}, {'name': ['\n落羽李青','\n'], 'number': ['813']}, {'name': ['\n 是欢欢呀丶', '\n '], 'number': ['808']}, {'name': ['\n面包强', '\n '], 'number': ['807']}]

由打印到控制台的数据可以看出来:从html文本中提取的主播名字和人气值数据中,含有较多无意义的符号,比如空格和换行符,因此需要精炼数据,剔除不需要内容。

print(targets)

[{'name': '贾克虎丶虎神', 'number': '152.0万'}, {'name': '小师弟180', 'number': '58.1万'}, {'name': '君克解说', 'number': '46.7万'}, {'name': '熊猫Tv丶狮子汪', 'number': '7.5万'}, {'name': '小丸仔卡特', 'number': '3234'}, {'name': 'LOL丶摇摆哥', 'number': '30.9万'}, {'name': 'S8全球总决赛', 'number': '21.8万'}, {'name': '左手吸血鬼QAQ','number': '6.6万'}, {'name': '柚子ob', 'number': '4.1万'}, {'name': '请叫我梦哥哥', 'number': '3.8万'}, {'name':'lol稳贱骨炼金', 'number': '3.4万'}, {'name': '朱晓飞五五五五', 'number': '2.5万'}, {'name': '小沁想吹空调吖', 'number': '2.2万'}, {'name': '宇宙大表哥', 'number': '1.9万'}, {'name': '又酱阿', 'number': '1.7万'}, {'name': '狼王沃李克', 'number': '1.5万'}, {'name': '熊猫伏念', 'number': '1.4万'}, {'name': '晓庄豹女', 'number': '8467'}, {'name': '猴王一心', 'number': '8084'}, {'name': '听白呀丶', 'number': '7833'}, {'name': '抗寒使者', 'number': '6906'}, {'name': 'BubbleBubble', 'number': '6802'}, {'name': '顺顺套路王', 'number': '6009'}, {'name': '可爱的苏韵儿', 'number': '5859'}, {'name': '一个很C的稻草人', 'number': '5809'}, {'name': '小白菜嗷呜', 'number': '5573'},{'name': '魔剑神无敌', 'number': '5168'}, {'name': 'Panda樱皇', 'number': '5036'}, {'name': '梦里来的小亦菲丶', 'number': '4815'}, {'name': '陈大G', 'number': '4645'}, {'name': '一情书一', 'number': '4584'}, {'name': '杀鸡游戏俱乐部', 'number': '4581'}, {'name': '琳琪baby', 'number': '4505'}, {'name': '这个赵信有丶C', 'number': '3844'},{'name': '瓜神z', 'number': '3595'}, {'name': '小马哥玩盖伦', 'number': '3424'}, {'name': '兰晨丶', 'number': '3420'}, {'name': '冰雪丶狐狸', 'number': '3233'}, {'name': '大学长丶', 'number': '3104'}, {'name': '小明伊芙琳', 'number': '3022'}, {'name': '熊猫TV丶萌阿琦i', 'number': '2810'}, {'name': '阿佑any', 'number': '2707'}, {'name': '江西丶社会强', 'number': '2703'}, {'name': '熊猫TVsao马', 'number': '2644'}, {'name': '熊猫TV丶油菜花1', 'number': '2537'}, {'name': '熊猫TV丶小老鼠', 'number': '2249'}, {'name': '熊猫盖伦王', 'number': '2231'}, {'name': '七哥卡牌丶', 'number': '2210'}, {'name': '阿毛Fit', 'number': '2196'}, {'name': '暖暖猫神', 'number': '2188'}, {'name': '嗜血馒头', 'number': '2187'}, {'name': '小兔儿甜', 'number': '2157'}, {'name': '无V情', 'number': '2065'}, {'name': 'LPL熊猫官方直播', 'number': '2049'}, {'name': '金克喵的猫珥朵丶', 'number': '2044'}, {'name': '头型睡炸的33', 'number': '2025'}, {'name': 'Dedizzz', 'number': '2002'}, {'name': '大表哥王者蛇女', 'number': '1998'}, {'name': '自闭症晚期患者z', 'number': '1930'}, {'name': '想打职业的XMxx', 'number': '1920'}, {'name': '大洋洋y', 'number': '1918'}, {'name': '准时不迟到的宁神', 'number': '1894'}, {'name': '泰国隆z', 'number': '1882'}, {'name': '这个人帅到没朋友', 'number': '1872'}, {'name': '有个辅助叫瓜瓜a', 'number': '1858'}, {'name': '迟到不准时的岛屿', 'number': '1844'}, {'name': '黑夜剑魔', 'number': '1834'}, {'name': '布依灬卡特', 'number': '1824'}, {'name': '希希天使S', 'number': '1823'}, {'name': '梁老师的作死大头', 'number': '1800'}, {'name': 'New恩赐解脱', 'number':'1798'}, {'name': '大雄d啊', 'number': '1791'}, {'name': '徐牛牛Zzz', 'number': '1762'}, {'name': '我的傻喵', 'number': '1753'}, {'name': '我是巴卫酱', 'number': '1740'}, {'name': '啊一丶Ay1zzz', 'number': '1724'}, {'name': '青蛙OB', 'number': '1680'}, {'name': '叫我EVEN好了', 'number': '1655'}, {'name': '热不息恶木荫丶', 'number': '1583'}, {'name': '叶芯丶', 'number': '1555'}, {'name': '暴躁小十一', 'number': '1551'}, {'name': '冷面寒枪人马神', 'number': '1526'}, {'name': '浩哥拉风依旧', 'number': '1515'}, {'name': '熊猫直播丶小沣酱', 'number': '1490'}, {'name': '夢遊王者丶画小雯', 'number': '1481'}, {'name': '长路漫漫剑圣作伴', 'number': '1449'}, {'name': '妖娆的考拉', 'number': '1447'}, {'name': '伽耳伽耳', 'number': '1415'}, {'name': 'V神参上', 'number': '1379'}, {'name': '可爱小仙女丶', 'number': '1379'}, {'name': '杨洋洋洋i', 'number': '1338'}, {'name': '可口可乐的克克', 'number': '1336'}, {'name': '苏璞呀丶', 'number': '1305'}, {'name': '超级无敌阿东锅', 'number': '1244'}, {'name': 'EnnnMing','number': '1237'}, {'name': '我是萌宝', 'number': '1204'}, {'name': 'Panda丶浅唱小生', 'number': '1179'}, {'name': '凌峰OwO', 'number': '1168'}, {'name': '请叫我越塔怪', 'number': '1161'}, {'name': '熊猫tv阿铖', 'number': '1124'}, {'name': '学习学习在学习', 'number': '1098'}, {'name': '小允呀', 'number': '1074'}, {'name': '叫我王者飞啦', 'number': '1071'}, {'name': 'Panda丶冰冰', 'number': '1043'}, {'name': '水壶0417', 'number': '1002'}, {'name': '熊猫TV丶花伦', 'number': '893'}, {'name': '小段啊丶', 'number': '885'}, {'name': '不懂老师yc', 'number': '862'},{'name': '有毒i吸血鬼', 'number': '851'}, {'name': '高调的火星人', 'number': '846'}, {'name': '爱唱歌的小南丶', 'number': '825'}, {'name': '熊猫尼古拉斯胖虎', 'number': '820'}, {'name': '爱吃板栗123', 'number': '817'}, {'name': '熊猫丶老白白', 'number': '817'}, {'name': '落羽李青', 'number': '813'}, {'name': '是欢欢呀丶', 'number': '808'}, {'name': '面包强', 'number': '807'}]

最后一步,通过__sort()方法按主播人气值对精炼后的数据进行排序,并通过__show()方法打印最终的数据。

rank 1:贾克虎丶虎神 152.0万

rank 2:小师弟180 58.1万

rank 3:君克解说 46.7万

rank 4:LOL丶摇摆哥 30.9万

rank 5:S8全球总决赛 21.8万

rank 6:熊猫Tv丶狮子汪 7.5万

rank 7:左手吸血鬼QAQ 6.6万

rank 8:柚子ob 4.1万

rank 9:请叫我梦哥哥 3.8万

rank 10:lol稳贱骨炼金 3.4万

rank 11:朱晓飞五五五五 2.5万

rank 12:小沁想吹空调吖 2.2万

rank 13:宇宙大表哥 1.9万

rank 14:又酱阿 1.7万

rank 15:狼王沃李克 1.5万

rank 16:熊猫伏念 1.4万

rank 17:晓庄豹女 8467

rank 18:猴王一心 8084

rank 19:听白呀丶 7833

rank 20:抗寒使者 6906

rank 21:BubbleBubble 6802

rank 22:顺顺套路王 6009

rank 23:可爱的苏韵儿 5859

rank 24:一个很C的稻草人 5809

rank 25:小白菜嗷呜 5573

rank 26:魔剑神无敌 5168

rank 27:Panda樱皇 5036

rank 28:梦里来的小亦菲丶 4815

rank 29:陈大G 4645

rank 30:一情书一 4584

rank 31:杀鸡游戏俱乐部 4581

rank 32:琳琪baby 4505

rank 33:这个赵信有丶C 3844

rank 34:瓜神z 3595

rank 35:小马哥玩盖伦 3424

rank 36:兰晨丶 3420

rank 37:小丸仔卡特 3234

rank 38:冰雪丶狐狸 3233

rank 39:大学长丶 3104

rank 40:小明伊芙琳 3022

rank 41:熊猫TV丶萌阿琦i 2810

rank 42:阿佑any 2707

rank 43:江西丶社会强 2703

rank 44:熊猫TVsao马 2644

rank 45:熊猫TV丶油菜花1 2537

rank 46:熊猫TV丶小老鼠 2249

rank 47:熊猫盖伦王 2231

rank 48:七哥卡牌丶 2210

rank 49:阿毛Fit 2196

rank 50:暖暖猫神 2188

rank 51:嗜血馒头 2187

rank 52:小兔儿甜 2157

rank 53:无V情 2065

rank 54:LPL熊猫官方直播 2049

rank 55:金克喵的猫珥朵丶 2044

rank 56:头型睡炸的33 2025

rank 57:Dedizzz 2002

rank 58:大表哥王者蛇女 1998

rank 59:自闭症晚期患者z 1930

rank 60:想打职业的XMxx 1920

rank 61:大洋洋y 1918

rank 62:准时不迟到的宁神 1894

rank 63:泰国隆z 1882

rank 64:这个人帅到没朋友 1872

rank 65:有个辅助叫瓜瓜a 1858

rank 66:迟到不准时的岛屿 1844

rank 67:黑夜剑魔 1834

rank 68:布依灬卡特 1824

rank 69:希希天使S 1823

rank 70:梁老师的作死大头 1800

rank 71:New恩赐解脱 1798

rank 72:大雄d啊 1791

rank 73:徐牛牛Zzz 1762

rank 74:我的傻喵 1753

rank 75:我是巴卫酱 1740

rank 76:啊一丶Ay1zzz 1724

rank 77:青蛙OB 1680

rank 78:叫我EVEN好了 1655

rank 79:热不息恶木荫丶 1583

rank 80:叶芯丶 1555

rank 81:暴躁小十一 1551

rank 82:冷面寒枪人马神 1526

rank 83:浩哥拉风依旧 1515

rank 84:熊猫直播丶小沣酱 1490

rank 85:夢遊王者丶画小雯 1481

rank 86:长路漫漫剑圣作伴 1449

rank 87:妖娆的考拉 1447

rank 88:伽耳伽耳 1415

rank 89:V神参上 1379

rank 90:可爱小仙女丶 1379

rank 91:杨洋洋洋i 1338

rank 92:可口可乐的克克 1336

rank 93:苏璞呀丶 1305

rank 94:超级无敌阿东锅 1244

rank 95:EnnnMing 1237

rank 96:我是萌宝 1204

rank 97:Panda丶浅唱小生 1179

rank 98:凌峰OwO 1168

rank 99:请叫我越塔怪 1161

rank 100:熊猫tv阿铖 1124

rank 101:学习学习在学习 1098

rank 102:小允呀 1074

rank 103:叫我王者飞啦 1071

rank 104:Panda丶冰冰 1043

rank 105:水壶0417 1002

rank 106:熊猫TV丶花伦 893

rank 107:小段啊丶 885

rank 108:不懂老师yc 862

rank 109:有毒i吸血鬼 851

rank 110:高调的火星人 846

rank 111:爱唱歌的小南丶 825

rank 112:熊猫尼古拉斯胖虎 820

rank 113:爱吃板栗123 817

rank 114:熊猫丶老白白 817

rank 115:落羽李青 813

rank 116:是欢欢呀丶 808

rank 117:面包强 807

以上数据是11月30号下午6点爬取的,因为直播网站的特点,不同时间段爬取的数据差异较大。另外,除了panda直播LOL板块,其他板块的数据也可以用上面的方法爬取,只需要将url最后的lol换成其他板块的名字即可。但是想要爬取其他直播网站的数据,上面的代码就需要改动其他地方了,因为不同直播网站的html结构是不一样的,需要具体分析html页面,写出相应的正则表达式以提取想要的数据。

写在最后,有爬虫,自然会有反爬虫,我试着爬过“某鱼”的数据,发现并不能通过request对象获取html网页,所以,需要学习的东西还很多,加油吧。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。