Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting a graph from PropertyGraphDataSource #930

Open
goshaQ opened this issue Aug 19, 2019 · 2 comments
Open

Deleting a graph from PropertyGraphDataSource #930

goshaQ opened this issue Aug 19, 2019 · 2 comments

Comments

@goshaQ
Copy link

goshaQ commented Aug 19, 2019

The deleteGraph method doesn't take into account created constraints on the metaLabel. Problem is that DETACH DELETE doesn't remove associated constraints and indexes. That causes an error when a PropertyGraph with the same name is deleted and written back again. See the following example:

val neo4jSource = GraphSources.cypher.neo4j(neo4jConfig)
val name = GraphName("arbitraryGraph")
neo4jSource.store(name, graph)
neo4jSource.delete(name)
neo4jSource.store(name, graph)

That causes the following exception:

Exception in thread "main" org.opencypher.okapi.impl.exception.GraphAlreadyExistsException: A graph with name arbitraryGraph is already stored in this graph data source.

Moreover, it doesn't allow write a PropertyGraph with entireGraphName, which makes no sense to me. What if I would like just store everything to the database and never restore that PropertyGraph (so metaLabel and related properties, i.e. ___morpheusID are not desired to be saved), because I have other data sources and graph database is the destination of the processed data?

@Mats-SX
Copy link
Member

Mats-SX commented Sep 26, 2019

Hello @goshaQ and thanks for reaching out to us.

I agree that the store-delete-store probably should be reconsidered in the scenario you describe. However, it isn't exactly the expected way of working with the PGDSs, as storing a graph typically is a costly operation. Could you tell us a little bit more about your use case for this order of operations?

In the current design, we require a fixed way of viewing the entirety of a Neo4j database as a graph, and we must choose a name for this. This name may then not be used by any subgraph, because if we allowed that it wouldn't be clear which one you would get when referencing the name; the union of all the subgraphs (e.g. entire graph) or just that one subgraph. In order to preserve the notion of the Neo4j database as one large graph, we need to reserve a name for this purpose.

In summary, I would agree with the following changes:

  • Remove the metaProperty (e.g. ___morpheusID) similar to what Neo4jGraphMerge does, because this is only used as temporary information in the between-state of creating nodes and relationships in the Neo4j database
  • Delete indexes/constraints when the graph is deleted, to not fail when writing another graph with a deleted graph's name

Does this sound sensible to you?

@goshaQ
Copy link
Author

goshaQ commented Sep 28, 2019

Thanks for your reply @Mats-SX !

I discovered the problem when experimented with the final export stage of the processing pipeline. In particular, the exception appeared during integration test, where a small amount of enriched data is expected to be written into Neo4j and at the end of the test removed.

The listed changes sound perfect to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants