Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDB eventstore, work in progress #118

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jankronquist
Copy link

No description provided.

}

try {
dynamoDB.transactWriteItems(TransactWriteItemsRequest.builder()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beware TWI - it costs double what a standard operation costs (obviously the Equinox.DynamoStore schema involves much more logic as a result of using UpdateItem). See jet/equinox#327 for my learnings from going down this road

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'm aware of this and this will most likely be a configuration option how to write. Ideally I would like adding events to an eventstream to be an atomic operation and initially I had one row per transaction (ie several events), but then I had to change this in order to conform to what seems to be the rule in occurrent that every single event should increment the version by one.

@johanhaleby Related to this, I was confused by this: EventStream read(String streamId, int skip, int limit);

Should skip be the number of events to skip or should this be the version number? If its the version number, shouldnt this be a long? Does the version number have to equal the number of events?

When doing eventsourcing I usually consider all the events generated by a command to be an atomic update of the eventstream, ie having versions between the start and the end of the transaction does not necessarily make sense.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had one row per transaction (ie several events), but then I had to change this in order to conform to what seems to be the rule in occurrent that every single event should increment the version by one.

Yeah the problem/tradeoff is that the minute you try to fulfil the transactional correctness requirement, you run into a set of problems:

  • TransactWriteItems doubles the charges for everything, which is a massive loss
  • You need something useful to make the write contingent on (i.e. the expectedversion etc - if you instead are checking that the previous item is present and the one you are writing is not, even the basic coding gets complex)

In Equinox, the schema resolves the forces by having the notion of a Tip per stream, which gates all writes going through:

  • one could keep an event counter in it, but I have an etag string (this allows one to rolling transactionally correct updates without having to write a new event every time). Where you are writing event 0, the condition is that the Tip does not exist
  • if you are persisting multiple events in one write, they can all get appended in a single Put/Update call

In addition to working for larger cases, it has the following key properties for normal use:

  • small streams are a single item that can be loaded via a single GetItem roundtrip
  • TransactWriteItems only becomes required when the tip overflows
  • minimum storage overhead

One thing it does complicate is the fact that the DDB Streams output will emit a full copy of the Tip for every update, e.g. if you are adding 2 events to a Tip that has one event already, the DDB streams output will be a DDB Streas event with the full Item (which hosts 3 events, but only 2 are new)

The other thing to bear in mind is that having >1 event per item means you need a good story about when you are writing 201K of events on top of 200K of existing events.

I would caution against having a mode switch in your implementation; testing, reasoning and talking about the code becomes a nightmare. Better to have a single impl that can deal with your use cases efficiently and test, tune and validate that. (The other reason I say that is that I fundamentally believe that an event per Item schema is just worse than useless in terms of cost and efficiency too)

Should skip be the number of events to skip or should this be the version number? If its the version number, shouldnt this be a long? Does the version number have to equal the number of events?

I use longs for event indexes; ESDB etc does too. In practice the CUs and latency it costs to read more than 2m events make it irrelevant (and there are fixed limits to how much can be held in a logical partition (10GB is it?), so any design that is predicated on unlimited stream lengths is not even theoretically implementable)

@johanhaleby
Copy link
Owner

Nice work Jan!

For everyone's info, you also sent me a private email, that I answered and wrote some comments on. We can continue the discussion by email or here, whatever suits you best :)

@jankronquist
Copy link
Author

FYI this is very experimental and a way for me to learn more about occurrent and it will of course need to have lots of more configuration options to be fully usable. I'm going on vacation for a few weeks, but I will pick this up later! At this point I just wanted to share what I have done so far ...

@johanhaleby johanhaleby mentioned this pull request Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants