scrapy中使用代理IP爬取数据时遇到IP失效时自动切换的方法
当使用临时的IP请求数据时,由于这些IP的过期时间极短,通常在1分钟~5分钟左右,这时scrapy就会报发以下错误
2020-01-17 17:00:48 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://xxxx/co s): Connection was refused by other side: 10061: 由于目标计算机积极拒绝,无法连接。.
这时如何自动切换IP,然后重新请求呢?
先看看scrapy的整体框架图,此错误是RetryMiddleware这个中间件报出的错误,也就是下图的的步骤5
所以一个方法是新建个Middleware,继承RetryMiddleware,重写process_exception函数,添加重置request proxy即可:
def process_exception(self, request, exception, spider): ## 针对超时和无响应的reponse,获取新的IP,设置到request中,然后重新发起请求 if '10061' in str(exception) or '10060' in str(exception): self.proxy_ip = fetch_proxy_ip() if self.proxy_ip: current_proxy = f'http://{self.proxy_ip}' request.meta['proxy'] = current_proxy if isinstance(exception, self.EXCEPTIONS_TO_RETRY) and not request.meta.get('dont_retry', False): return self._retry(request, exception, spider)
相关推荐
CycloneKid 2020-10-27
yangkang 2020-11-09
lbyd0 2020-11-17
sushuanglei 2020-11-12
85477104 2020-11-17
KANSYOUKYOU 2020-11-16
wushengyong 2020-10-28
lizhengjava 2020-11-13
星月情缘 2020-11-13
huangxiaoyun00 2020-11-13
luyong0 2020-11-08
腾讯soso团队 2020-11-06
Apsaravod 2020-11-05
PeterChangyb 2020-11-05
gaobudong 2020-11-04
wwwjun 2020-11-02
gyunwh 2020-11-02
EchoYY 2020-10-31