1200字范文,内容丰富有趣,写作的好帮手!
1200字范文 > 【Python】模拟登陆并抓取拉勾网信息(selenium+phantomjs)

【Python】模拟登陆并抓取拉勾网信息(selenium+phantomjs)

时间:2020-10-02 21:41:24

相关推荐

【Python】模拟登陆并抓取拉勾网信息(selenium+phantomjs)

环境

python3.5pip install seleniumphantomjs-2.1.1pip install pyquery

代码

# -*- coding:utf-8 -*-# 防止print中文出错import timeimport sysimport iosys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='gb18030')from pyquery import PyQuery as pqfrom selenium import webdriverfrom mon.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECfrom mon.desired_capabilities import DesiredCapabilities# 给phantomjs设置请求头dcap = dict(DesiredCapabilities.PHANTOMJS)dcap["phantomjs.page.settings.userAgent"] = ("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36")driver = webdriver.PhantomJS(desired_capabilities=dcap, executable_path=r"C:\Users\DELL\Desktop\Scrapy\phantomjs-2.1.1-windows\bin\phantomjs.exe")driver.set_window_size(400, 100)# 模拟登陆def login(login_url, username, password):print("begin login...")try:driver.get(login_url)driver.find_element_by_css_selector(".input_item.clearfix[data-propertyname='username'] input").send_keys(username)driver.find_element_by_css_selector(".input_item.clearfix[data-propertyname='password'] input").send_keys(password)driver.find_element_by_css_selector(".input_item.btn_group.clearfix[data-propertyname='submit'] input").click()except:print("login wrong...")# 模拟搜索def search_position(position_name):print("search position {}".format(position_name))try:search_input = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "search_input")))search_input.send_keys(position_name)search_btn = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "search_button")))search_btn.click()except:print("search wrong...")# 递归,逐页解析页面def parse_html():print("begin parse html...")try:next_page_label = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, ".item_con_pager .pager_container span:last-child")))html = pq(driver.page_source)items = html("#s_position_list .item_con_list li.con_list_item.default_list").items()for item in items:print(item.attr("data-company"))print(item.attr("data-positionname"))print(item.attr("data-salary"))print(item("a.position_link").attr("href"))print("\n")next_page_label.click()time.sleep(3)parse_html()except Exception as e:print(str(e))if __name__ == "__main__":login_url = "/login/login.html?ts=1508055021059&serviceId=lagou&service=https%253A%252F%%252F&action=login&signature=101A9F09764AD83E3E2A035A1506AF7A"username = "用户名"password = "用户密码"login(login_url, username, password)search_position("python")parse_html()

效果

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。