1200字范文,内容丰富有趣,写作的好帮手!
1200字范文 > 【python 淘宝爬虫】python 淘宝店铺名称 旺旺 销售量 抓取

【python 淘宝爬虫】python 淘宝店铺名称 旺旺 销售量 抓取

时间:2021-04-03 23:55:55

相关推荐

【python 淘宝爬虫】python 淘宝店铺名称 旺旺 销售量 抓取

一、需求分析

抓取淘宝店铺名称,旺旺,销售量

二、效果展示

三、实现源代码

# encoding: utf-8import sysreload(sys)sys.setdefaultencoding('utf-8')import pandas as pdimport timeimport retime1=time.time()import requestsseller=[]shop=[]sale=[]city=[]type=[]#################################定义城市列表###########################lista=["郑州市","洛阳市","开封市","南阳市","安阳市","商丘市","新乡市","平顶山市","许昌市","焦作市","周口市","信阳市","驻马店市","鹤壁市","濮阳市","漯河市","三门峡市","济源市","长治市","晋城市","运城市",\"聊城市","菏泽市","宿州市","淮北市","阜阳市","亳州市","蚌埠市","邢台市","邯郸市"]############################定义类别(在这里修改,只需要修改类别)###############aa="女鞋"#####################################################循环抓取###################################for w in lista:print ".............................................................."+w+"...................................................."for i in range(1,101):try:print "正在抓取第"+str(i)+"页...................."url = "/search?app=shopsearch&spm=a230r.7195193.0.0.ShnhPc&q="+aa+"&tracelog=shopsearchnoqcat&loc="+w+"&isb=0&shop_type=&ratesum=&s="+str((i-1)*20)html=requests.get(url).contentseller1=re.findall('"nick":"(.*?)","provcity":',html,re.S)for each in seller1:print eachseller.append(each)sale1=re.findall('"totalsold":(.*?),"procnt"',html,re.S)for each in sale1:print eachcity.append(w)type.append(aa)sale.append(each)shop1=re.findall('"rawTitle":"(.*?)",',html,re.S)for each in shop1:print each.replace(" ","")shop.append(each.replace(" ",""))except:pass###################################打印长度########################################print len(shop),len(seller),len(sale),len(city),len(type)########################数据框#################################################data=pd.DataFrame({'店铺名':shop,"卖家":seller,"销量":sale,"城市":city,"类型":type})print data# 写出excel############################文件名称记得修改######################################################writer = pd.ExcelWriter(r'C:\\taobao\\taobao4.xlsx', engine='xlsxwriter', options={'strings_to_urls': False})data.to_excel(writer, index=False)writer.close()time2 = time.time()print u'ok,爬虫结束!'print u'总共耗时:' + str(time2 - time1) + 's'

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。