Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Speed up counting entities for copy/graft #5475

Open
lutter opened this issue Jun 6, 2024 · 0 comments
Open

[Bug] Speed up counting entities for copy/graft #5475

lutter opened this issue Jun 6, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@lutter
Copy link
Collaborator

lutter commented Jun 6, 2024

We need to do something about the entity_count for grafts. Right now, when all data has been copied, graph-node will fire off a big query that counts the entities in the graft; that query can take hours in very large subgraphs.

There's a few different ways to handle that:

  • give up on accurate entity counts and set the count for copies/grafts to some fast estimate (either the count from the source, or the estimate that analyze comes up with)
  • count entities while we copy them. We'd have to turn queries of the form insert into dst select * from src into with ranges (insert into .. returning block_range) select count(*) from ranges where block_range @> int32::MAX and then store the counts for each batch in copy_table_state. After data copying has finished, the entity count is a simple aggregation over copy_table_state
  • keep counting entities as a separate step, but break it into batches along vid just like the actual copying does. That would require quite a bit more book keeping as counting can now be interrupted by node restarts
@lutter lutter added the bug Something isn't working label Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant