Replies: 2 comments 3 replies
-
Yes, I have wrestled with different approaches to deferring parsing and parsing on-demand, and have implemented such approaches in dpack and looked. In our applications, we deal with storing a lot of study data, and there are situations where we want to access some top-level information about study like authors and publication data, without having to load all the study result data (often a few MBs of structured data). There are a few considerations that have driven my current approach. First, it is certainly easiest to deal with plain JS objects, where all properties are accessed as simple properties. It is also a lot easier in our applications to not have to necessarily define a rigid schema/structure upfront. We have a lot of dynamic properties, and it would be challenging to have define schemas for everything. Likewise, with lmdb-store, I have tried to reduce barriers to use. With dpack, I did implement lazy/on-demand parsing of data using JS Proxies. And this certainly is nice from an API perspective; objects can still be treated as plain objects with properties. Property access can be intercepted and data can be parsed on-demand. However, the drawback of this is that proxies are much slower than plain objects. So while there is a lot of efficiency/performance to be gained by deferring or avoiding parsing of some embedded data, there is a lot lost in plain access to data that has been parsed. And this has a fair bit of complexity involved, especially as you start dealing with modifying objects. So it is certainly possible to have an improved performance with careful structuring of objects where we define which objects or properties should be lazily parsed and which shouldn't, but in the end, it seemed like it required a lot of careful optimizations and definitions of structures to out-perform plain objects. And with msgpackr and plain objects with V8's amazingly fast performance, the performance is just so good, it is a tough bar to beat. And in our current iteration of our application, where I really need to defer parsing of certain related objects, I simply store them in a separate table (db in LMDB), and load them separately. With LMDB, get()s are so ridiculously fast, that making everything that I really want to parse separately, as separate addressable entities in database, has seemed like the best approach, while keeping the overall serialization/deserialization process simple and easy to understand. But, there certainly is a place for schemes that involved more randomly accessible data, with on-demand parsing. I just haven't felt like it has ended up being worth the effort in our application. Also, related, dpack was also very optimized for browser usage, to work well in conjunction with huffman encoding used by compression algorithms. This is still kind of an issue with MessagePack, it is doesn't compress well, and so we actually just use JSON for communicating with the browser. |
Beta Was this translation helpful? Give feedback.
-
No, I think you have a good idea of the issues (I thought the issue with having to detach buffers before next read to same location was the most onerous to track/deal with).
No, not as written, but you can certainly use this approach if you copy the provided buffer so you have a stable copy (again recognizing there is some cost to this):
Or alternately you could use |
Beta Was this translation helpful? Give feedback.
-
@kriszyp This is probably an offtopic but I am curious to hear your thoughts on using some binary format for accessing data directly in
lmdb
value without deserializing/copying it to node heap. For instance flatbuffers or capnproto.The use-case: imagine you have a deeply nested object with large strings in some fields but you only need to read one top-level field of this object in runtime. With serialization/deserialization - there will be a significant overhead (memory, parsing, etc). So binary format directly accessible without deserialization sounds like a good idea (on the first look).
I've seen your cool DPack project that sounds like a pretty similar idea (with blocks, lazy parsing, etc).
But looks like you've decided to focus on Messagepack/CBOR instead. Did you discover some pitfalls with the approach taken in DPack? Also, did you consider those existing encodings like
flatbuffers
?Beta Was this translation helpful? Give feedback.
All reactions