-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sketch out a way to support multithreading #23
base: main
Are you sure you want to change the base?
Conversation
8226055
to
7ee343f
Compare
This implementation is kinda wonky, but is the best way I've come up with to support sessions/clients across multiple threads and pooling connections across multiple threads. It's based on the kind of hacky implementation in edgi-govdata-archiving/web-monitoring-processing#551. The basic idea here includes two pieces, and works around the fact that urllib3 is thread-safe, while requests is not: 1. `WaybackSession` is renamed to `UnsafeWaybackSession` (denoting it should only be used on a single thread) and a new `WaybackSession` class just acts as a proxy to multiple UnsafeWaybackSessions, one per thread. 2. A special subclass of requests's `HTTPAdapter` that takes an instance of urllib3's `PoolManager` to wrap. `HTTPAdapter` itself is really just a wrapper around `PoolManager`, but it always creates a new one. This version just wraps whatever one is given to it. `UnsafeWaybackSession` now takes a `PoolManager` as an argument, which, if provided, is passed to its `HTTPAdapter`. `WaybackSession` creates one `PoolManager` which it sets on all the actual `UnsafeWaybackSession` objects it creates and proxies access to. That way a single pool of requests is shared across many threads. This is super wonky! It definitely makes me feel like we might just be better off dropping requests and just using urllib3 directly (especially given #2 -- which means requests wouldn't be part of our public interface in any way). But this is a smaller change that *probably* carries less short-term risk.
def request(self, *args, **kwargs): | ||
concrete_session = self.get_unsafe_session() | ||
return concrete_session.request(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost certainly there’s a better way to handle this (e.g. with magic methods, or maybe wrapt).
Ok, given the lack of feedback, I am probably going to move forward with testing, cleaning up, and merging this approach this week. I don’t think it’s a good approach, though, so I’m also going to start moving forward with #2 and switching to using urllib3 directly instead of requests as a next step. (Httpx claims to be headed for an API-stable release by April, so I want to look into that, too.) |
This has been sitting for a long time! A general update: the more I stare at this, the less comfortable I feel with it. It introduces a super-wonky abstraction to get around Requests’s lack of thread-safety and, while that makes multi-threaded use much easier, it probably introduces some maddening corner cases. I’m pretty sure there’s no emergency here (nobody’s complained, and EDGI [a.k.a me] is the only user who desperately needs thread safety, but they already have a workaround, even if it’s ugly), so it’s probably better to take the time to put together a nicer solution. I think that means implementing #2 so that Requests is no longer part of our API, and then swapping Requests out for either urllib3 or Httpx, both of which should be thread-safe. (I’m less clear on Httpx’s sync mode being thread safe, so it needs more investigation. It’s also still in beta.) I’m leaving this open in case the above doesn’t work out or we run into a real need to ship multithreading support sooner, but I don’t currently expect to finish and merge it. |
This is less an actual PR than a sketch of one possible way to do things. It’s not necessarily meant to merge (or at least not exactly as-is). Trying to figure out how to take what I did in edgi-govdata-archiving/web-monitoring-processing#551 and integrate it more smoothly here.
This implementation is kinda wonky, but is the best way I've come up with to support sessions/clients across multiple threads and pooling connections across multiple threads. It's based on the kind of hacky implementation in edgi-govdata-archiving/web-monitoring-processing#551.
The basic idea here includes two-and-a-half pieces, and works around the fact that urllib3 is thread-safe, while requests is not:
WaybackSession
is renamed toUnsafeWaybackSession
(denoting it should only be used on a single thread) and a newWaybackSession
class just acts as a proxy to multiple UnsafeWaybackSessions, one per thread.A special subclass of requests's
HTTPAdapter
that takes an instance of urllib3'sPoolManager
to wrap.HTTPAdapter
itself is really just a wrapper aroundPoolManager
, but it always creates a new one. This version just wraps whatever one is given to it.UnsafeWaybackSession
now takes aPoolManager
as an argument, which, if provided, is passed to itsHTTPAdapter
.WaybackSession
creates onePoolManager
which it sets on all the actualUnsafeWaybackSession
objects it creates and proxies access to. That way a single pool of requests is shared across many threads.This is super wonky! It definitely makes me feel like we might just be better off dropping requests and just using urllib3 directly (especially given #2 -- which means requests wouldn't be part of our public interface in any way). But this is a smaller change that probably carries less short-term risk.