-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mention DynamoDB Streams as a means to access the event log as a stream #76
Comments
We have no experience on these in the team so would be hard to vet it, but if you have such experience a guide could be helpful. We could mark it as hints from community or something to mark that Akka team does not maintain / support this hm |
I do have some experience with it and it's very easy to use with AWS Lambda (and there are tons of examples from AWS). AWS also provides good ordering guarantees:
References:
I think it would be good to mention something with the caveat that the Akka team does not support DynamoDB streams. This way people won't be prone to quickly dismiss DynamoDB because of the lack of Akka Persistence Query (which is super useful) and have some alternatives to consider especially if they don't want to manage their own Cassandra or EventStore |
The DynamoDB streams are really powerful but If you introspect the underlying implementation of the persistence plugin the events can be stored into the journal (dynamo table) out of order. That's completely fine for the plugin as it doesn't query the data based on the timestamps but the sequence number of the events. Unfortunately this breaks ordering in the DDB stream as there the events are strictly time ordered - db event log. Looking into ways how to leverage this approach correctly but quite possibly would require refactor of the current version of the plugin so it will prevent such scenario from happening. |
Another problem with the underlying implementation when considering Dynamo Streams is the fact that the events for one persistence ID are bucketed by 100 events. The hashkey in the dynamo db is used as a partition / shard key in the underlying dynamo stream. Dynamo streams scales up proportionally to the dynamo tables and when new physical nodes are added additional lambda functions are added to start processing of these streams automatically. In theory the events stored in one bucket eg. event 99 will be on one physical node and the next event is going to other partition - there is a race condition which event will be processed first as they materialiaze itself in different shards of the stream. |
The documentation mentions:
It would be good to reference the DynamoDB Streams documentation as an alternative to Akka Persistence Query to treat the event log as a stream for downstream services to consume.
Mentioning some tactics on how to capture the stream using either AWS Lambda or the KCL combined with the DynamoDB Streams Adapter would also be useful. See here:
Of course, these options would only be viable if the end-user implements custom serialization of their events (choosing JSON or something with an IDL like Protobufs).
I would be happy to add this into the README if considered appropriate 😄
Cheers
Cal
The text was updated successfully, but these errors were encountered: