Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spike] pluggable datastores for CDP #5526

Open
mollykarcher opened this issue Nov 12, 2024 · 1 comment
Open

[spike] pluggable datastores for CDP #5526

mollykarcher opened this issue Nov 12, 2024 · 1 comment

Comments

@mollykarcher
Copy link
Contributor

What problem does your feature solve?

Ultimately, we want to make it as easy as possible for external contributors to create data lakes using different technologies (s3, r2, mongo, mqtt, etc). With how the code is currently structured, if someone were to create a new data storage option, they would have to contribute it back to this repository (go monorepo). This makes it so that the responsibility for quality and future maintenance for all data stores technically lies with the maintainers of this repo (SDF). We don't want to slow things down by putting ourselves in the middle here, and we don't want to be the arbiters of what people can build and how they build it.

What would you like to see?

A spike or design proposal that outlines how we could restructure our code or repositories in a way that would allow Galexie to accept pluggable datastores. This could mean that the interface for how to create a datastore is public, and in some separate repo that is maintained in an SDF-owned repo, but the implementations live in different, disperse repos. Also keep in mind that we already want to pull out the consumption components of CDP (#5525) into their own repo.

The "dream" dev journey could look something like:

  • Implement some interface for a datastore in my own github repository
  • Download/install Galexie, and it's configuration accepts my pluggable datastore interface/config with no code changes to Galexie necessary
  • Pull the ingest SDK in my language of choice, and it's configuration accepts my pluggable datastore config with no code changes necessary
@sreuland
Copy link
Contributor

sreuland commented Dec 2, 2024

wanted to post an idea for consideration, pluggable datastore as separate 0/S process, leverage inter-process rpc:

Image

Create a new datastore.PluggableDatastore as an implementation of the datastore.Datastore interface, it encapsulates the remote Datastore interactions, managing the process lifecycle, converting all datastore.Datastore methods into equivalent rpc messages via 0MQ REQ/RSP exchanges. Decouples Datastores from being a binary dependency in application code.

Image

Pluggable Datastore service could be implemented in any programming language that can be compiled to an o/s binary and
0MQ provides an SDK, which is most.

Pluggable Datastore service implementation needs to follow the contract for pluggability which
is to initiate a 0MQ socket and support REQ handlers for each of datastore interface methods.

[edit...couple days later]
looking at other blockchain projects for precedence in distributed processing rather than monolithic or dynamic loading for application runtime composition, one such project is - Tendermint ABCI, that architecture is synonymous, with the Tendermint core being similar to Galexie/Consumer and the Application being the remote Datastore instance proposed here, and the ABCI is the Datastore interface.

@urvisavla urvisavla self-assigned this Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To Do
Development

No branches or pull requests

3 participants