1200字范文 > Python进阶之使用Scrapy实现自动登录Github的两种方法(POST FormRequest from_response)

Python进阶之使用Scrapy实现自动登录Github的两种方法(POST FormRequest from_response)

时间：2018-10-31 12:48:58

Python进阶之使用Scrapy实现自动登录Github的两种方法

1. 通过.FormRequest()实现登录githubgithub1.py2. 通过.FormRequest.from_response()实现登录githubgithub2.pyps.py3. 需要注意的几点:

1. 通过.FormRequest()实现登录github

需求: 通过提交表单自动登录github需求分析:

1.目标登录页面: /login

2.表单提交页面: /session

2.form表单数据:

commit: Sign in
authenticity_token: VS+fpLCrOk/5kzW21z4TvUgAhT3TWyork1NQAmZ4Dv7z4noiYDFxNJ3VD18SKskyPrcyGMeo3KADeGB2PSORPw==
ga_id: 2106354846.1594901172
login: xxxxx
password: 123456
webauthn-support: supported
webauthn-iuvpaa-support: unsupported
return_to:
required_field_0ac1:
timestamp: 1599141753983
timestamp_secret: 3a2dd1bb343778097362c2440e688357aefc732642ccb354ba5a4da42889181e

3 多次提交不同密码看看表单数据有哪些是变化的

4.通过检查html发现authenticity_token,timestamp, timestamp_secret对应的数据

github1.py

import scrapyfrom ps import k_login,k_passwordclass GloginSpider(scrapy.Spider):name = 'github1'allowed_domains = ['']start_urls = ['/login']url = '/session'# def start_requests(self):def parse(self, response):commit = 'Sign in'authenticity_token = response.xpath('//input[@name="authenticity_token"]/@value').extract_first()# .extract_first是在选择器列表中返回第一个列表值# 等同于.extract()[0],也可以是[0].extract()# 测试中发现ga_id在登录前为空,尝试不添加这一项# ga_id = response.xpath('//meta[@name="octolytics-dimension-ga_id"]/@content').extract()login = k_loginpassword = k_passwordtimestamp = response.xpath('//input[contains(@name,"timestamp")]/@value')[0].extract()timestamp_secret = response.xpath('//input[contains(@name,"timestamp")]/@value')[1].extract()# print(authenticity_token)# print(ga_id)# print(timestamp,timestamp_secret)# 定义一个字典提交数据form_data = {'commit': commit,'authenticity_token': authenticity_token,# 'ga_id': ga_id,'login': login,'password': password,'webauthn-support': 'supported','webauthn-iuvpaa-support': 'unsupported','timestamp': timestamp,'timestamp_secret': timestamp_secret,}# scrapy.Request只支持get请求# post请求可以使用.FormRequest,默认为POSTyield scrapy.FormRequest(url=self.url,formdata=form_data,callback=self.after_login)def after_login(self,response):# print(response.body.decode('utf-8'))with open('./github.html','w',encoding='utf-8') as f:f.write(response.body.decode('utf-8'))

2. 通过.FormRequest.from_response()实现登录github

github2.py

import scrapyfrom ps import k_login,k_passwordclass Gitlog2Spider(scrapy.Spider):name = 'github2'allowed_domains = ['']start_urls = ['/login']def parse(self, response):# 通过scrapy.FormRequest.from_response()提交数据# 源码中已有的数据不需要再进行提交# 可以用来实现快速登录yield scrapy.FormRequest.from_response(# 请求响应结果response=response,# 提交数据formdata={'login': k_login,'password': k_password},callback=self.after_login)def after_login(self,response):# print(response.body.decode('utf-8'))with open('./github2.html','w',encoding='utf-8') as f:f.write(response.body.decode('utf-8'))