-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple threads #106
Comments
This package is currently based on the really broadly used Requests package, which is unfortunately not thread-safe. That means that, if you want to make requests from multiple threads, you should create a separate For example: mementos_to_get = [list, of, cdx, records, or, urls]
# Get a unique WaybackClient for whatever thread you're on.
def get_wayback_client():
if 'wayback' not in threading.local():
threading.local.wayback = wayback.WaybackClient()
return threading.local().wayback
def get_memento_safely(*args, **kwargs)
return get_wayback_client().get_memento(*args, **kwargs)
with ThreadPoolExecutor(max_workers=4) as executor:
for memento in executor.map(get_memento_safely, mementos_to_get):
# Do something with each memento result Or using classic thread classes: mementos_to_get = [list, of, cdx, records, or, urls]
class Worker(threading.Thread):
def __init__(self, input_queue, output_queue):
super().__init__()
self.input_queue = input_queue
self.output_queue = output_queue
def run(self):
# Make a client for this thread and use it:
with wayback.WaybackClient() as client:
while True:
try:
# This expects the queue to already be full, and no be added to in real time.
# Otherwise you should get() instead of get_nowait().
item = self.input_queue.get_nowait()
memento = client.get_memento(your, args, here)
self.output_queue.put(memento)
except queue.Empty:
# This thread is done, so let the run() method end.
break
except Exception as error:
self.output_queue.put(error)
finally:
self.input_queue.task_done()
processing_queue = queue.Queue()
results_queue = queue.Queue()
for item of mementos_to_get:
processing_queue.put_nowait(item)
threads = [Worker(processing_queue, results_queue) for i in range(4)]
# Wait for them all to finish:
processing_queue.join()
# Start reading the results:
while not results_queue.empty():
memento_or_error = results.queue.get()
# Do something with the result You can do some really complicated things with That said, thread safety is one of my 2 next priorities (the other is the Wayback Machine’s new, beta CDX search API). v0.4.0 will be out in the next couple days, and then thread safety should be in v0.5.0. When that’s done, you can just use one client wherever you want, without worrying about whether you are on different threads. But that will take a lot of work, since it means moving off the Requests package. I don’t have a clear timeframe for it. (See #58). |
Relatedly, if your use case is basically:
I’d appreciate any feedback on how we could or should make a nice wrapper for that in #17. (It will probably be a while before that gets implemented, though!) |
Amazing! thank you so so much for your explanation <3 |
I've tried multithreading and got blocked by the website. |
Quick update: I’m considering this a duplicate of #58, which I am pretty committed to actually solving this month. @kyungsub1108 we made a bunch of rate limiting improvements recently in v0.4.4, and have some even bigger ones coming in v0.5.0 later this month (along with actual thread safety, so you can use a single client across multiple threads). Hopefully those help with situations like yours. |
Hello,
I would like to be able to check multiple domains at the same time, is it okay to use multithreading ?
The text was updated successfully, but these errors were encountered: