Mention DynamoDB Streams as a means to access the event log as a stream #76

calvinlfer · 2018-07-27T05:29:52Z

The documentation mentions:

...it does not include an Akka Persistence Query plugin.

It would be good to reference the DynamoDB Streams documentation as an alternative to Akka Persistence Query to treat the event log as a stream for downstream services to consume.
Mentioning some tactics on how to capture the stream using either AWS Lambda or the KCL combined with the DynamoDB Streams Adapter would also be useful. See here:

Of course, these options would only be viable if the end-user implements custom serialization of their events (choosing JSON or something with an IDL like Protobufs).

I would be happy to add this into the README if considered appropriate 😄

Cheers
Cal

ktoso · 2018-07-27T06:34:54Z

We have no experience on these in the team so would be hard to vet it, but if you have such experience a guide could be helpful. We could mark it as hints from community or something to mark that Akka team does not maintain / support this hm

calvinlfer · 2018-07-27T13:40:06Z

I do have some experience with it and it's very easy to use with AWS Lambda (and there are tons of examples from AWS). AWS also provides good ordering guarantees:

Each stream record appears exactly once in the stream.
For each item that is modified in a DynamoDB table, the stream records appear in the same sequence as the actual modifications to the item.

References:

I think it would be good to mention something with the caveat that the Akka team does not support DynamoDB streams. This way people won't be prone to quickly dismiss DynamoDB because of the lack of Akka Persistence Query (which is super useful) and have some alternatives to consider especially if they don't want to manage their own Cassandra or EventStore

teroxik · 2018-12-11T15:22:23Z

The DynamoDB streams are really powerful but If you introspect the underlying implementation of the persistence plugin the events can be stored into the journal (dynamo table) out of order. That's completely fine for the plugin as it doesn't query the data based on the timestamps but the sequence number of the events. Unfortunately this breaks ordering in the DDB stream as there the events are strictly time ordered - db event log.

Looking into ways how to leverage this approach correctly but quite possibly would require refactor of the current version of the plugin so it will prevent such scenario from happening.

teroxik · 2018-12-14T09:42:14Z

Another problem with the underlying implementation when considering Dynamo Streams is the fact that the events for one persistence ID are bucketed by 100 events.

The hashkey in the dynamo db is used as a partition / shard key in the underlying dynamo stream. Dynamo streams scales up proportionally to the dynamo tables and when new physical nodes are added additional lambda functions are added to start processing of these streams automatically.

In theory the events stored in one bucket eg. event 99 will be on one physical node and the next event is going to other partition - there is a race condition which event will be processed first as they materialiaze itself in different shards of the stream.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mention DynamoDB Streams as a means to access the event log as a stream #76

Mention DynamoDB Streams as a means to access the event log as a stream #76

calvinlfer commented Jul 27, 2018

ktoso commented Jul 27, 2018

calvinlfer commented Jul 27, 2018 •

edited

Loading

teroxik commented Dec 11, 2018

teroxik commented Dec 14, 2018

Mention DynamoDB Streams as a means to access the event log as a stream #76

Mention DynamoDB Streams as a means to access the event log as a stream #76

Comments

calvinlfer commented Jul 27, 2018

ktoso commented Jul 27, 2018

calvinlfer commented Jul 27, 2018 • edited Loading

teroxik commented Dec 11, 2018

teroxik commented Dec 14, 2018

calvinlfer commented Jul 27, 2018 •

edited

Loading