-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared read-only state between objects with copy on write #93
Comments
This is somewhat similar to RelStorage's in-memory pickle state cache, which is shared by all Connections of a Storage, but operating on the unpickled data (and then of course copying it). I like the idea! A challenge there is making such a shared cache effective with the different MVCC states that each Connection may be seeing. RelStorage has a complicated system of "checkpoints" it uses to accomplish this that works OK for short-lived transactions and Connections that don't drift too far apart from each other in terms of their MVCC state. |
This cache would be keyed by oid + serial, so it would be orthogonal to MVCC. It would store Python objects, so there would be no additional deserialization overhead. Because the sharing would be at the object level, there would be memory savings, not just savings in loading object objects. |
If we could store non-dicts as |
Ah, I see. It helps that the current laughingly-misnamed "pickle cache" knows what (oid, serial) values it's going to be requesting; the RelStorage case just has to deal with arbitrary requests over time. |
Nice idea! |
There's a lot of interest in using ZODB with asynchronous frameworks, especially for applications that block on network requests to services. From a purely programming perspective, gevent makes this quite tractable, but the cost of maintaining many open ZODB connections with their own caches is a major challenge. The cost of maintaining many open connections could be mitigated if data could be shared among their caches.
One way to do this would be to have a shared state cache of read-only state objects. Consider the extremely common case of persistent objects that store their data in dictionaries (and leaving aside non-persistent subobjects, for the sake of discussion). Set-state for such objects could simply assign the instance dictionary to the state. First assigning an attribute to such an object could copy the state dict first. This would allow use of shared immutable state dicts, requiring no copying for read-only operations. Note that in this scenario, only state is shared, not persistent objects.
You could use slots, or secondary dictionaries for non-shared mutable state.
Similar schemes could be used for BTrees and Buckets, although we'd need to introduce new Python subobjects to represent shared state.
To make this work, we'd likely want to create persistent subobjects that disallowed storing non-persistent mutable subobjects, which would have other benefits.
The text was updated successfully, but these errors were encountered: