Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Banned responses get to the engine in the end #48

Open
3hhh opened this issue Sep 19, 2020 · 0 comments
Open

Banned responses get to the engine in the end #48

3hhh opened this issue Sep 19, 2020 · 0 comments

Comments

@3hhh
Copy link

3hhh commented Sep 19, 2020

If a response is identified as a ban by response_is_ban(self, request, response), it'll currently reach the spider's parse() method after the final retry attempt by your middleware, because you don't raise an exception or otherwise stop the response after more than ROTATING_PROXY_PAGE_RETRY_TIMES banned attempts.
This is somewhat inconvenient as it requires the user to call response_is_ban(self, request, response) again in his parse() implementation.

Apart from that I also noticed that ROTATING_PROXY_PAGE_RETRY_TIMES = 1 generally results in 2 retries rather than just 1 (it's always 1 more than ROTATING_PROXY_PAGE_RETRY_TIMES).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant