Future evolution of object storage #1454
Replies: 1 comment
-
It's been a while since I wrote this, and I wanted to post a quick update on how my thinking has developed since then. First, a correction. I wrote this:
And this turned out to be blatantly wrong. The core data structures are way too complex. I knew that they are complex back when I wrote this, but I thought that complexity was inherent to the problem they solve, and thus a price we had to pay. I'm now convinced this is wrong. A lack of imagination on my part. I have some ideas on how to simplify the core data structures. I'm currently working on one of those ideas (#1525), and I have some more that I'll hopefully find the time to write about soon. With the correction out of the way, here are some more points where my thinking evolved:
|
Beta Was this translation helpful? Give feedback.
-
Update (2023-02-08): Some information in this post is outdated or outright wrong. See my update below.
With the recent introduction of the new partial object API (which replaced the old one), the API for creating shapes and the objects that constitute them has taken a big step forward. I think, as far as core data structures and APIs are concerned, we're in a pretty good shape to take the next steps toward turning Fornjot into a useful product.
There are two caveats here:
The purpose of this discussion is to collect ideas for this future evolution. I decided not to open an issue for this, as these are not actionable improvements right now. Just ideas that might or might not become relevant, and that I don't want to forget in the meantime.
Current Situation
The new partial object API has brought partial and full objects more in line with each other. While a partial object in the old API was a completely blank slate, in the new API, a partial object is created with a fully correct object graph. For example, if you create a partial half-edge, it is structured like this:
This is exactly that same structure that a valid full object would have, and it's what makes creating a full object from a partial object so much easier than it was with the old API.
The big difference between partial and full objects (aside from mutability; see below), is that the geometry of this partial object graph can be completely undefined. The curve that defines the half-edge, where the vertices are on that curve, where anything is on the surface, and where the surface is in global space... none of that must be known to create a partial half-edge. All of that can be defined or inferred later, as the object is being constructed.
Topology vs. Geometry
There used to be a clear separation between topology (the thing that defines how objects are related in space) and geometry (the thing that defines where objects are) in Fornjot. This followed the literature I was reading, and it kind of made sense to me at the time. Since then, there seemed to be less and less point in making that distinction, and it is no longer being made.
I think the new partial object API has changed that. Because partial and full objects are very similar, as far as their topology is concerned. The object graph that defines the relations between objects is there from the beginning and kept correct all through the construction of the object. Where they differ is on the geometry side, where partial objects can be wholly undefined.
What if we had a clear separation between topology and geometry, and stored them in different places? The topology side could stay much like it is now (although not completely; see my comments on mutability below), while geometry would be undefined by default, and could evolve over time. That would allow us to unify partial and full objects, meaning there would be no separate API for constructing objects. Making sure that the geometry is defined and valid would just become another part of validation.
Having this separation could also become useful once we implement constraint-based modeling. The constraint solver could get free reign over the geometry, while only needing read access to topology.
Identity and Mutability
When I built the current object storage system a few months ago, the big insight was that object identity is important, and that knowing about the identity of an object (and checking it during validation) would enable much more robustness in the CAD kernel. The big insight behind the new partial object API was that the same notion of identity could be applied to object construction too, which simplifies things a lot.
I already talked about differences between partial and full objects above. Another one that I haven't gone into so far, is mutability. Full objects are immutable, and object stores are append-only. Partial objects are mutable. They must be, as they are still being constructed. To go back to the example above, if you change the curve that one vertex of the half-edge references, you change the curve that the other one references too, because they are the same curve. This is realized by having a wrapper,
Partial
, that defines an object's identity and manages its state.If we're going to unify the topology side of partial and full objects, that means objects will have to be mutable. That was already the case in the past, and I decided to remove it back then. I'm not 100% sure how I feel about re-introducing it now, but I think there are some factors that make it more palatable, this time around:
Managing Identity and State
This leaves the question of how to manage identity and its evolving state over time. The partial object API currently uses
RwLock
for that, which works but isn't great. First you can run into panics, if you're not careful (which is already an improvement over the deadlocks that could happen if I didn't opt for the panics). Second, you end up with code like this:That's a lot of visual noise (in the form of those
read
calls), just to access a struct field, and you might even need to store the results of thoseread
(orwrite
) calls in local variables, depending on the situation. This is made necessary by the nature ofRwLock
and the lifetimes of its read/write guards. Full objects, on the other hand, are referred to byHandle
s, which implementDeref
, making things much nicer. I believe there is an approach that allows for evolving the state of an object over time, while still allowing to keep theDeref
implementation.I didn't realize this initially, but with this whole thing about identity and state, I was just reinventing Clojure. I figure, why not lean into that and try to adapt their notion of Refs (which is an implementation of STM/MVCC) to Rust?
Here's my rough idea:
AtomicUsize
(or similar) that points to the current version of the object.AtomicUsize
is updated to point to that. Old versions are kept around. I'm not sure how long. Maybe forever, maybe they are garbage-collected, or maybe some kind of double-buffering scheme with one old/new object each is sufficient. To be determined during the implementation.Deref
for theHandle
, allowing for convenient read access.AtomicUsize
. If a transaction determines that another transaction has completed while it was underway, it tries again with the new value. Details to be determined, but software-transactional memory isn't exactly a new idea, so I'm sure it'll work out.That's it for now. I don't know how much of that will work out as I imagine it, but I wanted to get it written down now, while the thoughts are still fresh.
Beta Was this translation helpful? Give feedback.
All reactions