Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate populating Database #137

Open
4 tasks
ipc103 opened this issue Apr 25, 2024 · 1 comment
Open
4 tasks

Investigate populating Database #137

ipc103 opened this issue Apr 25, 2024 · 1 comment

Comments

@ipc103
Copy link
Collaborator

ipc103 commented Apr 25, 2024

Right now, fetching our data and building our site is essentially a four-step process. Each step requires the previous one to have been completed in the same flow.

  1. Fetch data from the GitHub API.
  2. Load that data directly into a data.json file.
  3. Build the static site, including the data.json file.
  4. Deploy the site to GitHub pages.

Since the data.json file doesn't get persisted beyond the build, we need to re-fetch data on each pass, and are also limited in the type of data we can store.

An alternative approach would be:

  1. Fetch data from the GitHub API.
  2. Load that data into a database somewhere (CosmosDB, for example).
  3. Before build, load the data from the database into a data.json file.
  4. Build the static site, including the data.json file.
  5. Deploy the site to GitHub pages.

This would have two big advantages:

  1. Deploying the site would no longer be dependent on fetching data from the GitHub API. This should make the overall build/deploy time faster.
  2. Storing in a database gives us much more flexibility on the type of data we store. For example, this could potentially make time-series calculations more feasible.

As a first pass, I'd propose the following course of action:

  • Create an external database store. As a first pass, having a simple key-value store (to mimic our current JSON format) might work best.
  • Write a new script to fetch the existing metrics from the GitHub API and save the results to the new database. For now it's okay if the results overwrite the previous results each time.
  • Add a new workflow to run on both workflow_dispatch and a cron (every day?) that updates the new database.
  • Once we have confidence that the data is being updated, update our deploy workflow to pull data from the database and generate a data.json file instead of running the fetch script directly.

Once that's setup, we can look at adding additional metric stores in a followup.

@ipc103
Copy link
Collaborator Author

ipc103 commented Apr 25, 2024

CC @Lehcar this is something we've discussed a bunch but apparently never documented in an issue.

@Lehcar Lehcar changed the title Invesitage populating Database Investigate populating Database Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant