Python获取网页中动态加载的数据
0、XHR 是什么?
XHR是 XMLHttpRequest 对象。既Ajax功能实现所依赖的对象,在JQuery中的Ajax是对 XHR的封装。
1、查看异步加载数据的RequestURL
图片示例:
2、查看图片在HTML页面中的绝对定位
图片示例:可以看到动态JS新增Div标签。
复制IMG在HTML 页面中的绝对定位
3、爬取异步加载的数据
这种可以用来爬取循环加载的网站。
代码示例:
from bs4 import BeautifulSoupimport requestsimport time
url = '/discover?page='def get_page(Url, data=None): print(Url)
wb_data = requests.get(Url)
soup = BeautifulSoup(wb_data.text, 'lxml')
imgs = soup.select('a.cover-inner > img') # 获取页面所有的img titles = soup.select('section.content > h4 > a') # 获取所有img的title links = soup.select('section.content > h4 > a') # 获取所有标签的链接
if data == None:
for img, title, link in zip(imgs, titles, links):
data = {
'img': img.get('src'),
'title': title.get('title'),
'link': link.get('href')
}
print(data)
def get_more_pages(Url, start, end): for one in range(start, end):
get_page(Url + str(one)) # 添加页码 time.sleep(1) # 防止被封IP,所以暂停1秒。
get_more_pages(url, 1, 10) # 获取1-9页的数据。
代码运行结果:
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3 /Users/mac/Desktop/data/cloudbility/四周爬虫/2-KneWOne.py
/discover?page=1{'img': 'https://making-photos./photos/dfaec1d3ba6df86562f9699869ababd4.jpg!thing.fixed.big', 'title': 'Osmo 儿童游戏套件', 'link': '/things/osmo-er-tong-you-xi-tao-jian'}
{'img': 'https://making-photos./photos/2883a5c06b4da12a0cde1b7dff26b104.jpg!thing.fixed.big', 'title': 'TBot', 'link': '/things/tbot'}
{'img': 'https://making-photos./photos/aaeb7de0751ebc627c0971deb633b265.jpg!thing.fixed.big', 'title': 'olloclip 四合一摄像镜头 iPhone 6/6 Plus 版', 'link': '/things/olloclip-si-he-she-xiang-jing-tou-iphone-6-slash-6-plus-ban'}
{'img': 'https://making-photos./photos/91063c24d62dead12a8d9e2a54887f51.jpg!thing.fixed.big', 'title': 'Momax SelfiFit Mini 蓝牙自拍器', 'link': '/things/momax-selfifit-lan-ya-zi-pai-qi'}
{'img': 'https://making-photos./photos/a97bf7f2a200bace8bd1d629b6436b85.jpg!thing.fixed.big', 'title': '贱驴 007', 'link': '/things/jian-lu-007-1'}
{'img': 'https://making-photos./photos/8c7cebb9844a6a86123e8554b8e4.jpg!thing.fixed.big', 'title': 'Moshi Xync Lightning Keychain 连接线', 'link': '/things/moshi-xync-lightning-keychain-lian-jie-xian'}
{'img': 'https://making-photos./photos/9fb1f1372d5b2c486e3ca903ca11826e.jpg!thing.fixed.big', 'title': 'RainDesign iLevel 2 支架', 'link': '/things/raindesign-ilevel-2'}
{'img': 'https://making-photos./photos/5ad99a590f7be4ba4b0df26f94e2c8a4.jpg!thing.fixed.big', 'title': 'Smart Rope 智能跳绳', 'link': '/things/smart-rope-zhi-neng-tiao-sheng'}
{'img': 'https://making-photos./photos/394b7f9a17177fe1278dc0b6ee51dea5.jpg!thing.fixed.big', 'title': 'Magic 桜', 'link': '/things/magic-ying'}
{'img': 'https://making-photos./photos/fa6e238c053df828fb6d516656c8648c.jpg!thing.fixed.big', 'title': 'SwatchMate Color Capturing', 'link': '/things/swatchmate-color-capturing-cube'}
{'img': 'https://making-photos./photos/edd9f874c4bf1a5f371ca2e3cba5f01b.jpg!thing.fixed.big', 'title': 'foto.sosho', 'link': '/things/foto-dot-sosho'}
{'img': 'https://making-photos./photos/30bfd4c95fd4bafa058e3485b117a2a0.jpg!thing.fixed.big', 'title': 'Bang & Olufsen BeoPlay H8', 'link': '/things/bang-and-olufsen-beoplay-h8'}
/discover?page=2{'img': 'https://making-photos./photos/6249e13497c84f8d850521a201af008a.jpg!thing.fixed.big', 'title': 'Woody 可折叠创意书灯', 'link': '/things/woody-ke-zhe-die-chuang-yi-shu-deng'}
{'img': 'https://making-photos./photos/e46d335dcf45605aae2e5889a0eaf4b6.jpg!thing.fixed.big', 'title': '创意智能感应温度显示魔术水杯', 'link': '/things/chuang-yi-zhi-neng-gan-ying-wen-du-xian-shi-mo-zhu-shui-bei'}
{'img': 'https://making-photos./photos/cf7d91d87fd70dd909750e7bd4e81f4d.jpg!thing.fixed.big', 'title': 'Broadlink RM Pro', 'link': '/things/broadlink-rm-pro'}
{'img': 'https://making-photos./photos/bae34600ab52cba16aed13309c70999e.jpg!thing.fixed.big', 'title': 'Anker USB 3.0 4-Port Hub', 'link': '/things/anker-r-usb-3-dot-0-4-port'}
{'img': 'https://making-photos./photos/62963e790016b599c0c4af6f63339848.jpg!thing.fixed.big', 'title': 'Eva Solo Fridge Carafe 彩漾冷水瓶', 'link': '/things/eva-solo-fridge-carafe-cai-yang-leng-shui-ping'}
{'img': 'https://making-photos./photos/b866f966623b4af3b4b00079c572b06e.jpg!thing.fixed.big', 'title': 'Flick Candles', 'link': '/things/flick-candles'}
{'img': 'https://making-photos./photos/0ff87359eaff7ca5642d450926ab536d.jpg!thing.fixed.big', 'title': '欧蒂芙 天使冰膜', 'link': '/things/ou-di-fu-tian-shi-bing-mo'}
{'img': 'https://making-photos./photos/2cda0a6b2421529b0fee175237ebdee1.png!thing.fixed.big', 'title': 'Propolinse 比那氏蜂胶复合漱口水', 'link': '/things/propolinse-bi-na-shi-feng-xiao-fu-he-shu-kou-shui'}
{'img': 'https://making-photos./photos/cc8147ca5612bf0a7e6a13b7e971fd3e.jpg!thing.fixed.big', 'title': 'Withings Activité', 'link': '/things/withings-activite'}
{'img': 'https://making-photos./photos/d5585da14eb35e2cf82b855bc406afd7.jpg!thing.fixed.big', 'title': 'RainDesign mStand', 'link': '/things/mstand-laptop-stand'}
{'img': 'https://making-photos./photos/04847c2ae796e6af39ab1f821701f044.jpg!thing.fixed.big', 'title': '萌奇 x JOWAY 小冰棒', 'link': '/things/meng-qi-x-joway-xiao-bing-bang'}
{'img': 'https://making-photos./photos/ade52894eed66ae6992de74950005e47.jpg!thing.fixed.big', 'title': 'Rivers Demita', 'link': '/things/rivers-demita'}
/discover?page=3{'img': 'https://making-photos./photos/f27e40a07d9465efb0f868b940865626.jpg!thing.fixed.big', 'title': 'Adorable Pet Bed 可爱的鱼形宠物床', 'link': '/things/adorable-pet-bed-ke-ai-de-yu-xing-chong-wu-chuang'}
{'img': 'https://making-photos./photos/43d62540f88969581065fc969cdf1439.jpg!thing.fixed.big', 'title': 'gatsby crazy cool 超凉身体降温喷雾', 'link': '/things/gatsby-crazy-cool-chao-liang-shen-ti-jiang-wen-pen-wu'}
{'img': 'https://making-photos./photos/81995fd123f247bd850a0a46783fdf76.jpg!thing.fixed.big', 'title': 'UNI-CUB β 电动代步车', 'link': '/things/uni-cub-b'}
{'img': 'https://making-photos./photos/b4a5f52fb530ad5341e94dc9d419f181.jpg!thing.fixed.big', 'title': 'Stadler Form OTTO 古典原木风扇', 'link': '/things/stadler-form-otto-gu-dian-yuan-mu-feng-shan'}
{'img': 'https://making-photos./photos/87014d301ec4da5c2906f0ad3d6a81a8.jpg!thing.fixed.big', 'title': 'momoda 床头宝—互联网智能闹钟', 'link': '/things/momoda-chuang-tou-bao-hu-lian-wang-zhi-neng-nao-zhong'}
{'img': 'https://making-photos./photos/f2d41f734f108c19fd6a84adf6eed760.jpg!thing.fixed.big', 'title': 'Ithink 手立视 智能网络摄像头', 'link': '/things/ithink-shou-li-shi-zhi-neng-wang-luo-she-xiang-tou'}
{'img': 'https://making-photos./photos/310e8c6edcc832639de94f1d208fad32.jpg!thing.fixed.big', 'title': '丝瓜年代 牛皮笔记本', 'link': '/things/si-gua-nian-dai-niu-pi-bi-ji-ben'}
{'img': 'https://making-photos./photos/899266e88c00d2413fcab437eb87a6e1.jpg!thing.fixed.big', 'title': 'Okamura Duke CZ 真皮办公椅', 'link': '/things/okamura-duke-cz-zhen-pi-ban-gong-yi'}
{'img': 'https://making-photos./photos/ed11079d3d623b95b1a29fa7fe975e40.jpg!thing.fixed.big', 'title': '麦芽智能灯', 'link': '/things/mai-ya-zhi-neng-deng'}
{'img': 'https://making-photos./photos/8fea9e5b30d9bb9ba0a4493fd981b36d.jpg!thing.fixed.big', 'title': 'Gekkopod 壁虎自拍支架', 'link': '/things/gekkopod-bi-hu-zi-pai-zhi-jia'}
{'img': 'https://making-photos./photos/5ae10361c86fbf47692e68c1f1b166eb.png!thing.fixed.big', 'title': 'TACS Twenty 4 TS1101 男士石英手表', 'link': '/things/tacs-twenty-4-ts1101-nan-shi-shi-ying-shou-biao'}
{'img': 'https://making-photos./photos/a590ee9c3cd457c6234923815df8fe9a.png!thing.fixed.big', 'title': 'TACS Pixel TS1302 男士石英手表', 'link': '/things/tacs-pixel-ts1302-nan-shi-shi-ying-shou-biao'}
/discover?page=4{'img': 'https://making-photos./photos/2ec807d8a030f5eb9e3ecd0d88ef771a.png!thing.fixed.big', 'title': 'LOCA 超薄透明手机壳', 'link': '/things/loca-i6-slash-6-plus-chao-bo-tou-ming-shou-ji-ke'}
{'img': 'https://making-photos./photos/816411b67f901ddf3d03bb47d15c9e2b.png!thing.fixed.big', 'title': 'TACS Kraft TS1306 男士石英手表', 'link': '/things/tacs-kraft-ts1306-nan-shi-shi-ying-shou-biao'}
{'img': 'https://making-photos./photos/28751c8173b9692a06b32f1a5c3e36a3.jpg!thing.fixed.big', 'title': 'imblu 多功能洗漱包', 'link': '/things/imblu-duo-gong-neng-shou-na-bao'}
{'img': 'https://making-photos./photos/abbfe26bec5e061b46959743f1c3a3c6.jpg!thing.fixed.big', 'title': 'Super-Soft Ear Muff for Sleeping', 'link': '/things/super-soft-ear-muff-for-sleeping'}
{'img': 'https://making-photos./photos/05dd5c32dfdd95a954aa23d0b288f0fe.jpg!thing.fixed.big', 'title': 'Bluelounge posto 耳机支架', 'link': '/things/bluelounge-posto-er-ji-zhi-jia'}
{'img': 'https://making-photos./photos/987dbcd0fee536d08554742bd20687d2.png!thing.fixed.big', 'title': 'ANGELHOOD 日式祝愿娃娃', 'link': '/things/angelhood-ri-shi-zhu-yuan-wa-wa'}
{'img': 'https://making-photos./photos/0632f191508933848d123bfd33b82d51.jpg!thing.fixed.big', 'title': 'ten Design Stationery 转动圆珠笔', 'link': '/things/ten-design-stationery-zhuan-dong-yuan-zhu-bi'}
{'img': 'https://making-photos./photos/a58213fe2e341928f2e424338f37956c.jpg!thing.fixed.big', 'title': '保友 - Ergonor 独立脚托', 'link': '/things/ergonor-du-li-jiao-tuo'}
{'img': 'https://making-photos./photos/167446c4da1e1f971b1aeebf005dc685.jpg!thing.fixed.big', 'title': 'IdeaShow 阿拉神灯', 'link': '/things/ideashow-a-la-shen-deng'}
{'img': 'https://making-photos./photos/09ebc797c57930ef6ddc56d9d9f86495.jpg!thing.fixed.big', 'title': 'Fabriano Boutique 匹诺曹钢笔', 'link': '/things/fabriano-boutique-pi-nuo-cao-gang-bi'}
{'img': 'https://making-photos./photos/22619d5ab27a7ebec352c3fe20396ab9.jpg!thing.fixed.big', 'title': 'IRIS OHYAMA 手持除螨吸尘器', 'link': '/things/iris-ohyama-shou-chi-chu-man-xi-chen-qi'}
{'img': 'https://making-photos./photos/3ddde827164ea9779c3d3732ca67ad0e.jpg!thing.fixed.big', 'title': 'Fred & Friends Doomed Crystal Skull Shot Glass', 'link': '/things/fred-and-friends-doomed-crystal-skull-shot-glass'}
/discover?page=5{'img': 'https://making-photos./photos/316f43719495992bcf30ec89a85d7ec8.jpg!thing.fixed.big', 'title': '网格吊床', 'link': '/things/wang-ge-diao-chuang'}
{'img': 'https://making-photos./photos/2d09c1b3db6b58c633c7b1e3ed1ac6cb.jpg!thing.fixed.big', 'title': 'Broadlink TC1', 'link': '/things/broadlink-tc1'}
{'img': 'https://making-photos./photos/a495fbcb757a0b061971ed15ccaff5f9.jpg!thing.fixed.big', 'title': 'DIVOOM Travel III 音箱', 'link': '/things/divoom-travel-mi-ni-san-fang-hu-wai-lan-ya-xiao-yin-xiang'}
{'img': 'https://making-photos./photos/d0877ed9dfa2cd855aa5acfd32cdcc62.jpg!thing.fixed.big', 'title': '乐事 A200', 'link': '/things/le-shi-a200'}
{'img': 'https://making-photos./photos/46b8cdbf2053b87d8c4c8d275b3d5b05.jpg!thing.fixed.big', 'title': 'Bang & Olufsen Beoplay H4', 'link': '/things/bang-and-olufsen-beoplay-h4'}
{'img': 'https://making-photos./photos/0f48595529627efd5ed58d5791385a63.jpg!thing.fixed.big', 'title': '追求科技 ZQ16 艺术签名版', 'link': '/things/zhui-qiu-ke-ji-zq16-yi-zhu-qian-ming-ban'}
{'img': 'https://making-photos./photos/d342b49d8effb9f3e60258c39e9193f1.jpeg!thing.fixed.big', 'title': 'Philips SHE4205BK', 'link': '/things/philips-she4205bk'}
{'img': 'https://making-photos./photos/dd5616e71ea578c2b00c68656fbadaa8.jpg!thing.fixed.big', 'title': 'SONY a7', 'link': '/things/sony-a7-1'}
{'img': 'https://making-photos./photos/6bccdd5c134977ca5149fc0101862340.jpg!thing.fixed.big', 'title': 'Humanscale word chair', 'link': '/things/humanscale-word-chair'}
{'img': 'https://making-photos./photos/0d55169f38de18740eb9e4fdc5c76865.jpg!thing.fixed.big', 'title': 'Cutipol DUNA Gold', 'link': '/things/cutipol-duna-gold'}
{'img': 'https://making-photos./photos/2194ed37ca0092372763cd9ee2d490e3.jpg!thing.fixed.big', 'title': 'Dunoon 向日葵马克杯', 'link': '/things/dunoon-xiang-ri-kui-ma-ke-bei'}
{'img': 'https://making-photos./photos/819bc7954c54311c88952b956fca4901.jpg!thing.fixed.big', 'title': '质造 星球杯', 'link': '/things/zhi-zao-xing-qiu-bei'}
.....
(完)
原文链接:Python获取网页中动态加载的数据 - 运维之路(Opsroad)社区Python获取网页中动态加载的数据 - 运维之路(Opsroad)社区Python获取网页中动态加载的数据 - 运维之路(Opsroad)社区