scrapy
__author__ = ‘Administrator‘ # -*- encoding:utf-8 -*- import scrapy class QuoteSpider(scrapy.Spider): name = ‘poxiao‘ start_urls=[‘https://www.poxiao.com/type/movie/‘] def parse(self, response):#固定的 quotes=response.xpath(‘//li/h3‘)#内容 for quote in quotes: yield { ‘name‘:quote.xpath(‘./a/text()‘).extract_first(), ‘author‘:‘https://www.poxiao.com‘+quote.xpath(‘./a/@href‘).extract_first() } next_page=response.xpath(‘//div[@class="list-pager"]/a[last()-1]/@href‘).extract_first() if next_page: yield response.follow(next_page,self.parse)
用SCRAPY爬取某网页链接地址
scrapy runspider ***.py 运行此工程
SCRAPY runspider ***.py -o aa.json 保存成JSON文件
scrap runspider ***.py -o aa.csv -t csv 保存成EXCEL
相关推荐
andrewwf 2020-11-11
Arvinzx 2020-10-28
CycloneKid 2020-10-27
paleyellow 2020-10-25
baifanwudi 2020-10-25
heyboz 2020-10-21
wumxiaozhu 2020-10-16
ZHANGRENXIANG00 2020-07-27
zhangll00 2020-07-05
javaraylu 2020-06-28
ZHANGRENXIANG00 2020-06-28
Catastrophe 2020-06-26
Catastrophe 2020-06-26
fangjack 2020-06-25
andrewwf 2020-06-16
qyf 2020-06-14
荒乱的没日没夜 2020-06-14