Future evolution of object storage #1454

hannobraun · 2022-12-16T11:37:05Z

hannobraun
Dec 16, 2022
Maintainer

Update (2023-02-08): Some information in this post is outdated or outright wrong. See my update below.

With the recent introduction of the new partial object API (which replaced the old one), the API for creating shapes and the objects that constitute them has taken a big step forward. I think, as far as core data structures and APIs are concerned, we're in a pretty good shape to take the next steps toward turning Fornjot into a useful product.

There are two caveats here:

I might be wrong, and more fundamental problems might still be lurking.
Even if I'm right, the current structure won't serve us forever. New features will pose new problems, and those will have to be addressed by further evolving the core data structures and APIs.

The purpose of this discussion is to collect ideas for this future evolution. I decided not to open an issue for this, as these are not actionable improvements right now. Just ideas that might or might not become relevant, and that I don't want to forget in the meantime.

Current Situation

The new partial object API has brought partial and full objects more in line with each other. While a partial object in the old API was a completely blank slate, in the new API, a partial object is created with a fully correct object graph. For example, if you create a partial half-edge, it is structured like this:

The two (partial) vertices of the (partial) half-edge both reference the same (partial) curve.
The (partial) half-edge references a (partial) global edge, which in turn references (partial) global vertices. Those are the same as the (partial) global vertices referenced by the (partial) vertices.
The (partial) global edge references the same (partial) global curve as the (partial) curve that the (partial) vertices reference.

This is exactly that same structure that a valid full object would have, and it's what makes creating a full object from a partial object so much easier than it was with the old API.

The big difference between partial and full objects (aside from mutability; see below), is that the geometry of this partial object graph can be completely undefined. The curve that defines the half-edge, where the vertices are on that curve, where anything is on the surface, and where the surface is in global space... none of that must be known to create a partial half-edge. All of that can be defined or inferred later, as the object is being constructed.

Topology vs. Geometry

There used to be a clear separation between topology (the thing that defines how objects are related in space) and geometry (the thing that defines where objects are) in Fornjot. This followed the literature I was reading, and it kind of made sense to me at the time. Since then, there seemed to be less and less point in making that distinction, and it is no longer being made.

I think the new partial object API has changed that. Because partial and full objects are very similar, as far as their topology is concerned. The object graph that defines the relations between objects is there from the beginning and kept correct all through the construction of the object. Where they differ is on the geometry side, where partial objects can be wholly undefined.

What if we had a clear separation between topology and geometry, and stored them in different places? The topology side could stay much like it is now (although not completely; see my comments on mutability below), while geometry would be undefined by default, and could evolve over time. That would allow us to unify partial and full objects, meaning there would be no separate API for constructing objects. Making sure that the geometry is defined and valid would just become another part of validation.

Having this separation could also become useful once we implement constraint-based modeling. The constraint solver could get free reign over the geometry, while only needing read access to topology.

Identity and Mutability

When I built the current object storage system a few months ago, the big insight was that object identity is important, and that knowing about the identity of an object (and checking it during validation) would enable much more robustness in the CAD kernel. The big insight behind the new partial object API was that the same notion of identity could be applied to object construction too, which simplifies things a lot.

I already talked about differences between partial and full objects above. Another one that I haven't gone into so far, is mutability. Full objects are immutable, and object stores are append-only. Partial objects are mutable. They must be, as they are still being constructed. To go back to the example above, if you change the curve that one vertex of the half-edge references, you change the curve that the other one references too, because they are the same curve. This is realized by having a wrapper, Partial, that defines an object's identity and manages its state.

If we're going to unify the topology side of partial and full objects, that means objects will have to be mutable. That was already the case in the past, and I decided to remove it back then. I'm not 100% sure how I feel about re-introducing it now, but I think there are some factors that make it more palatable, this time around:

We could use a more principled approach to it, that avoids many of the pitfalls around shared mutable state. See below.
For the longest time, I imagined the kernel basically as a batch processing system, and I no longer think this is appropriate. It needs to be more dynamic, and I believe that means that it needs a notion of evolving the state of objects, while keeping their identity.
Being able to unify partial and full objects would certainly be a big entry on the plus side.

Managing Identity and State

This leaves the question of how to manage identity and its evolving state over time. The partial object API currently uses RwLock for that, which works but isn't great. First you can run into panics, if you're not careful (which is already an improvement over the deadlocks that could happen if I didn't opt for the panics). Second, you end up with code like this:

vertex.read().surface_form.read().global_form

That's a lot of visual noise (in the form of those read calls), just to access a struct field, and you might even need to store the results of those read (or write) calls in local variables, depending on the situation. This is made necessary by the nature of RwLock and the lifetimes of its read/write guards. Full objects, on the other hand, are referred to by Handles, which implement Deref, making things much nicer. I believe there is an approach that allows for evolving the state of an object over time, while still allowing to keep the Deref implementation.

I didn't realize this initially, but with this whole thing about identity and state, I was just reinventing Clojure. I figure, why not lean into that and try to adapt their notion of Refs (which is an implementation of STM/MVCC) to Rust?

Here's my rough idea:

Each object has a cell in an object store, but instead of the object itself, we have an AtomicUsize (or similar) that points to the current version of the object.
When an object is modified, a new version of its state is created, and the AtomicUsize is updated to point to that. Old versions are kept around. I'm not sure how long. Maybe forever, maybe they are garbage-collected, or maybe some kind of double-buffering scheme with one old/new object each is sufficient. To be determined during the implementation.
If each version of an object has a stable address in memory, we can keep implementing Deref for the Handle, allowing for convenient read access.
Updating an object's state only happens through transactions, which are represented by a closure. Transactions are optimistic and synchronized through the AtomicUsize. If a transaction determines that another transaction has completed while it was underway, it tries again with the new value. Details to be determined, but software-transactional memory isn't exactly a new idea, so I'm sure it'll work out.

That's it for now. I don't know how much of that will work out as I imagine it, but I wanted to get it written down now, while the thoughts are still fresh.

hannobraun · 2023-02-07T10:34:08Z

hannobraun
Feb 7, 2023
Maintainer Author

It's been a while since I wrote this, and I wanted to post a quick update on how my thinking has developed since then.

First, a correction. I wrote this:

I think, as far as core data structures and APIs are concerned, we're in a pretty good shape to take the next steps toward turning Fornjot into a useful product.

And this turned out to be blatantly wrong. The core data structures are way too complex. I knew that they are complex back when I wrote this, but I thought that complexity was inherent to the problem they solve, and thus a price we had to pay. I'm now convinced this is wrong. A lack of imagination on my part.

I have some ideas on how to simplify the core data structures. I'm currently working on one of those ideas (#1525), and I have some more that I'll hopefully find the time to write about soon.

With the correction out of the way, here are some more points where my thinking evolved:

Topology vs. Geometry: I now doubt that making a strong distinction between always-defined topology and possibly-undefined geometry makes sense. Maybe it does, maybe it doesn't, but I'm currently thinking more along the lines of simplifying the core data structures, reducing or (if possible) completely removing any redundancy, then having a system that can infer any of the redundant data that was previously stored in the data structures. If that works out, it could reduce (maybe even completely remove?) the need for having undefined data in the object graph.
Identity and Mutability: If it's possible to radically simplify the core data structures, as I hope, then we might not need anything to be mutable. If the object graph were simpler, it could instead be practical to propagate a change through the whole graph, assigning a new identity for each object that changed, or references something that changed. In fact, if we're inferring redundant data (which would require some kind of ID-based cache, to prevent issues with numerical stability), then we might need objects to be immutable.
Managing Identity and State: I think this is a valid idea, but it's probably more complicated than we need it to be. Even if we can't do away with mutable objects (see previous items), our current architecture still has a central entity (the objects service) that owns the object data structure. We can thus channel all mutation of objects through this service, which means we won't need STM and optimistic transactions (for the time being, at least).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future evolution of object storage #1454

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Future evolution of object storage #1454

hannobraun Dec 16, 2022 Maintainer

Current Situation

Topology vs. Geometry

Identity and Mutability

Managing Identity and State

Replies: 1 comment

hannobraun Feb 7, 2023 Maintainer Author

hannobraun
Dec 16, 2022
Maintainer

hannobraun
Feb 7, 2023
Maintainer Author