Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workers: replace materialized views with incrementally-updated rollup tables #56

Open
mojodna opened this issue Dec 13, 2018 · 0 comments

Comments

@mojodna
Copy link
Member

mojodna commented Dec 13, 2018

User and hashtag statistics are facilitated by the user_stats and hashtag_stats tables. These are updated continuously using housekeeping-loop.sh.

Under some conditions, refreshing the materialized views may take upwards of 2 hours (hence the 2h timeout), resulting in apparent lag in the API (even when the raw_changesets data is up-to-date).

Replacing the materialized views with incrementally-updated rollup tables would improve the latency tremendously while reducing the amount of I/O done by Postgres.

This post summarizes the 2 approaches well: https://www.citusdata.com/blog/2018/10/31/materialized-views-vs-rollup-tables/

I see 2 ways to approach this:

Way 1 - manually

  1. create + populate a table matching the schema of the current materialized view
  2. introduce additional UPDATE SQL to src/stats.js to accompany the existing INSERTs/UPDATEs to the raw_changesets table

This keeps the aggregation logic in one place, but may under/over-count (in the rollup table) if errors occur.

Way 2 - triggers

  1. create + populate a table matching the schema of the current materialized view
  2. create a trigger that updates relevant rollup tables when rows are added to raw_changesets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant