-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What's the current rate limit for CDX search? #153
Comments
Just to be clear, this isn't an official package from the Internet Archive, so for most questions not specifically about this Python package, you should contact them directly. BUT I do try and keep in close contact with the staff there, and the current limit for requests to If you are using this package, it does its best to stick to the limits for you automatically, but there are some significant issues we fixed around rate limits in the latest release (v0.4.4) and a complete overhaul of rate limits in the next release (v0.5.0, hopefully later this month 🤞) — so make sure you're on the latest version! Also keep in mind that rate limits in this library are expressed in calls per second, so to make a request every 1.25s, you should configure: client = WaybackClient(WaybackSession(search_calls_per_second=0.8)) And make sure to back off that value even more if you are using multiple clients on multiple threads. Also be careful not to create too many HTTP connections if you are multithreading! That'll be easier in v0.5.0, but in the current release, doing so is messy — see #106 (comment). Finally, once you receive a 429 response, make sure to stop all new requests immediately and do not start again for at least 60s. If you make new requests during that 60s window, your IP will get blocked for progressively longer time periods, from a few hours up to a few days. |
Thanks for the clear explanation! I'm running the script (single-threaded) from a GCP VM so I guess that's why it got rate limited so quickly |
Hi there, I'm currently sending search request every 1.25 seconds continuously but soon received 429 errors. May I ask what's the current recommended rate limit for the CDX search API? Thanks!
The text was updated successfully, but these errors were encountered: