-
NewIf I have a key that is associated with multiple values, and each value has an id, what would be the best way to delete a value given its id? OldWhat would be the recommended way to model something like comments?
I could model this as a message pack object, but this would mean writing a new comment would involve reading/writing the entire post. I roughly expect the text of each post to be around 500kb, if not more. So this seems bad for performance. Is there a recommendation for what to do here? I have some initial thoughts:
However, with this model--I do not know how to update a specific comment. Wanted to hear your thoughts on whether there's a more optimal way to mode this. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 3 replies
-
This is a great question, and worth considering some of the different ways you can structure this in lmdb-js., I think you are on the right track here, but a lot of this can depend on what you want to optimize for, so I will summarize some of the different approaches (which you may have already considered). Embedded Arrays Relation DB Approach
In this comment table, each comment has its own id, so comments can individually be updated and deleted, as needed.
And then when it comes time to retrieve/render your post with comments, you search the index for all the comment ids for a given post id, and then retrieve each comment by its id that you found in the index. There is a clear performance disadvantage for reading here: retrieving the set of all comments for a post requires a random read/get for each comment. However, this is good solid normalized database design, would have excellent write performance, and would still have good read characteristics that would probably scale and perform as well or better than any SQL database. It is also worth noting that for entries larger than a couple hundred bytes there is usually more time spent on deserialization than actually database lookups (partly because LMDB is so fast), and this would have basically the same amount of total deserialization cost as the first approach. Also with post/comment apps, there might be good reason for keeping retrieval of posts as a separate action from retrieving comments. Typically you want to show a post as fast as possible, than plenty of time to load comments later. Multipart Keys
With this approach, your comments are now sequentially ordered, so that all the comments for a post can be retrieved with a single getRange query:
Since these are sequentially accessed in single traversal, this is faster than doing random access retrievals for each comment. And since comments still have a unique id, so you could delete/update existing comments:
Caching |
Beta Was this translation helpful? Give feedback.
-
In regards to the second question:
If you what you are asking is how to do this:
Technically, what you are asking for is possible:
However, you probably don't want to be storing arbitrary data objects in as values in On the otherhand, if your comment index database just has values that are keys (and make sure to use
Or if you use arrays as keys, (third approach above), even simpler:
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the excellent & comprehensive explanations! A few clarifying questions / thoughts:
Can I do |
Beta Was this translation helpful? Give feedback.
-
No, they can both/either be strings/UUIDs or any primitive values, (as long as the total key size isn't more than 1978 bytes). And yes, they can skip around, be randomly generated or whatever you want.
No, its not the index, its the actual value, and you are correct, the docs incorrectly state the type as number (it has to be a number if you are specifying the version number though, and the TS docs do have a correct entry for deleting by value of any type). I will fix that.
Yes, this will work fine, and is a great choice, you can use UUID strings as the keys and values of dupSort databases (and inside key arrays).
It is worth noting that buffers are directly written as is without any typing, so that means they won't round trip. You can use them as keys for gets and puts, but if you do getRange() and try to read the keys, they won't be preserved as buffers (unless you use a custom encoder). So using strings would be easier. (I have been meaning to make a custom encoder for UUIDs sometime, so they can be written in their compact 16-byte glory). |
Beta Was this translation helpful? Give feedback.
This is a great question, and worth considering some of the different ways you can structure this in lmdb-js., I think you are on the right track here, but a lot of this can depend on what you want to optimize for, so I will summarize some of the different approaches (which you may have already considered).
Embedded Arrays
As you have mentioned, comments can easily be embedded arrays, and this is great for smaller data structures that change less frequently. However, as you realized, if there are larger data structures with frequent additions/updates to the comments, the reading and writing of the post and all other comments can significantly increase the overhead for adding comments. How…