You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I implemented a ban_policy to mark redirect 302 as a "ban".
But once the request reached the maximum retries it is let through and therefor picked-up by scrapy.downloadermiddlewares.redirect
Which in turn restart a max_proxies_to_try cycle the redirected request (a useless captacha page.)
2020-10-02 05:31:07 [rotating_proxies.middlewares] DEBUG: Gave up retrying <GET http://www.url.com> (failed 6 times with different proxies)
2020-10-02 05:31:07 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://www.url.com/redirected/to/captacha> from <GET http://www.url.com>
2020-10-02 05:31:10 [rotating_proxies.middlewares] DEBUG: Gave up retrying <GET http://www.url.com/redirected/to/captacha> (failed 6 times with different proxies)
Shouldn't we add a raise IgnoreRequest() like so:
def_retry(self, request, spider):
retries=request.meta.get('proxy_retry_times', 0) +1max_proxies_to_try=request.meta.get('max_proxies_to_try',
self.max_proxies_to_try)
ifretries<=max_proxies_to_try:
logger.debug("Retrying %(request)s with another proxy ""(failed %(retries)d times, ""max retries: %(max_proxies_to_try)d)",
{'request': request, 'retries': retries,
'max_proxies_to_try': max_proxies_to_try},
extra={'spider': spider})
retryreq=request.copy()
retryreq.meta['proxy_retry_times'] =retriesretryreq.dont_filter=Truereturnretryreqelse:
logger.debug("Gave up retrying %(request)s (failed %(retries)d ""times with different proxies)",
{'request': request, 'retries': retries},
extra={'spider': spider})
raiseIgnoreRequest("Max retries reached")
The text was updated successfully, but these errors were encountered:
I implemented a ban_policy to mark redirect 302 as a "ban".
But once the request reached the maximum retries it is let through and therefor picked-up by scrapy.downloadermiddlewares.redirect
Which in turn restart a max_proxies_to_try cycle the redirected request (a useless captacha page.)
Shouldn't we add a
raise IgnoreRequest()
like so:The text was updated successfully, but these errors were encountered: