You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I discovered that using get_random() to choose a proxy from the list is not optimal, indeed in my example:
I crawl a site that uses datadom to protect itself from crawling, so not to be banned, I have a DOWNLOAD_DELAY at 180 seconds
I have 2 proxies in ROTATING_PROXY_LIST
DOWNLOAD_DELAY=180
CONCURRENT_REQUESTS_PER_DOMAIN=1
CONCURRENT_REQUESTS=2 (like the number of proxies)
Sometimes get_random() returns the same proxy as the spider already in use and therefore waits for the end of the DOWNLOAD_DELAY.
Would it be possible to replace get_random() with a get_unused() function? a function that returns the first "free" proxy that is not inside the DOWNLOAD_DELAY?
thank you
fred
1st file : log I observed with the problem (see the comments to the right)
2nd file : log without problem (see the comments to the right) 1st log.txt 2nd log.txt
The text was updated successfully, but these errors were encountered:
Hello,
I discovered that using get_random() to choose a proxy from the list is not optimal, indeed in my example:
Sometimes get_random() returns the same proxy as the spider already in use and therefore waits for the end of the DOWNLOAD_DELAY.
Would it be possible to replace get_random() with a get_unused() function? a function that returns the first "free" proxy that is not inside the DOWNLOAD_DELAY?
thank you
fred
1st file : log I observed with the problem (see the comments to the right)
2nd file : log without problem (see the comments to the right)
1st log.txt
2nd log.txt
The text was updated successfully, but these errors were encountered: