Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mention DynamoDB Streams as a means to access the event log as a stream #76

Open
calvinlfer opened this issue Jul 27, 2018 · 4 comments

Comments

@calvinlfer
Copy link
Contributor

The documentation mentions:

...it does not include an Akka Persistence Query plugin.

It would be good to reference the DynamoDB Streams documentation as an alternative to Akka Persistence Query to treat the event log as a stream for downstream services to consume.
Mentioning some tactics on how to capture the stream using either AWS Lambda or the KCL combined with the DynamoDB Streams Adapter would also be useful. See here:

dynamodb-streams-design-patterns

Of course, these options would only be viable if the end-user implements custom serialization of their events (choosing JSON or something with an IDL like Protobufs).

I would be happy to add this into the README if considered appropriate 😄

Cheers
Cal

@ktoso
Copy link
Member

ktoso commented Jul 27, 2018

We have no experience on these in the team so would be hard to vet it, but if you have such experience a guide could be helpful. We could mark it as hints from community or something to mark that Akka team does not maintain / support this hm

@calvinlfer
Copy link
Contributor Author

calvinlfer commented Jul 27, 2018

I do have some experience with it and it's very easy to use with AWS Lambda (and there are tons of examples from AWS). AWS also provides good ordering guarantees:

  • Each stream record appears exactly once in the stream.
  • For each item that is modified in a DynamoDB table, the stream records appear in the same sequence as the actual modifications to the item.

References:

I think it would be good to mention something with the caveat that the Akka team does not support DynamoDB streams. This way people won't be prone to quickly dismiss DynamoDB because of the lack of Akka Persistence Query (which is super useful) and have some alternatives to consider especially if they don't want to manage their own Cassandra or EventStore

@teroxik
Copy link
Contributor

teroxik commented Dec 11, 2018

The DynamoDB streams are really powerful but If you introspect the underlying implementation of the persistence plugin the events can be stored into the journal (dynamo table) out of order. That's completely fine for the plugin as it doesn't query the data based on the timestamps but the sequence number of the events. Unfortunately this breaks ordering in the DDB stream as there the events are strictly time ordered - db event log.

Looking into ways how to leverage this approach correctly but quite possibly would require refactor of the current version of the plugin so it will prevent such scenario from happening.

@teroxik
Copy link
Contributor

teroxik commented Dec 14, 2018

Another problem with the underlying implementation when considering Dynamo Streams is the fact that the events for one persistence ID are bucketed by 100 events.

The hashkey in the dynamo db is used as a partition / shard key in the underlying dynamo stream. Dynamo streams scales up proportionally to the dynamo tables and when new physical nodes are added additional lambda functions are added to start processing of these streams automatically.

In theory the events stored in one bucket eg. event 99 will be on one physical node and the next event is going to other partition - there is a race condition which event will be processed first as they materialiaze itself in different shards of the stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants