Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor provenance #300

Open
brandomr opened this issue Aug 1, 2023 · 0 comments
Open

Refactor provenance #300

brandomr opened this issue Aug 1, 2023 · 0 comments

Comments

@brandomr
Copy link
Collaborator

brandomr commented Aug 1, 2023

Current state

Currently we store provenance information in both postgres and neo4j. This leads to redundancy and complexity. It also causes some strange behavior--for example, the timestamp a relationship is created can only be found in postgres when such a thing could easily be set on the relation in neo4j. It also means we have to retain the autogen/enums.py and other vestigial things.

Additionally, the graph relations and queries vastly out-complicate current HMI needs. This can be simplified.

Proposed state

Store all provenance information directly in neo4j and remove any lingering provenance data from postgres.

Simplified relationships

There are currently only a few things we care about provenance for:

  1. model is related to document
  2. dataset is related to document
  3. document is related to document
  4. document is related to artifact (code)
  5. model is derived from artifact (code)
  6. model is derived from model
  7. dataset is derived from simulation
  8. dataset is derived from dataset

Each relationship/edge should have a timestamp for when it was set.

NOTE: Users and projects DO NOT belong in provenance relationships at this time. That is a complication best addressed later.

Proposed queries

The HMI needs to have trivial ways to run a single search:

Find all nodes of specific type(s) that are 1 hop from a given node

We can allow greater than 1 hop in the future, but for now I do not see a use case in the HMI for >1 hop. This single query should be able to cover all immediate use cases, including:

  1. Find all items related to a document
  2. Find models derived from my current model
  3. Find the associated code artifact from which my model was derived
  4. Etc.

Assumptions

This assumes that #299 is in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

1 participant