Draft: Example nextjs site that imports repositories from GitHub to Splitgraph, then exports them to Seafowl, then renders a chart of stargazers #21

milesrichardson · 2023-05-26T15:18:06Z

No description provided.

Assisted by the one and only GPT-4

Implement the import panel and stub out the export panel, using a single Stepper component and a react context with a reducer for managing the state. Implement the fetch requests to start the import, and also to await the import. Co-Authored by GPT-4 ;)

…ort-to-seafowl-task` The `start-export-to-seafowl` route takes a list of source tables from Splitgraph (list of `{namespace,repository,table}`), and starts a task to export them to Seafowl. It returns a list of objects `{taskId: string; tableName: string;}`, where each item represents the currently exporting table (and `tableName` is the source table name). The `await-export-to-seafowl-task` route takes a single `taskId` parameter and returns its status, i.e. `{completed: boolean; ...otherInfo}`

The ExportPanel first renders a "Start Export" button. Then, while the export is running, it renders an `ExportTableLoadingBar` for each table that is being exported. Each of thee individual components sends its own polling request with its `taskId` to the `await-export-to-seafowl-task` endpoint, and upon completion of each task, sends an action to the reducer, which handles it by updating the set of loading tasks. When the set of loading tasks is complete, it changes the `stepperState` to `export_complete`. If any of the tasks has an error, then the `stepperState` changes to `export_error` which should cause all loading bars to unmount - i.e., any error will short-circuit all of the table loading, even if some were to complete. At that point the user can click "start export" again. This completes the logic necessary for import and export, and now it's just a matter of styling the components, linking to the Splitgraph Console, adding explanatory text, and finally rendering a chart with the data. We'll also want to create a meta-repository in Splitgraph for tracking which GitHub repos we've imported so far, analogously to how we track Socrata metadata for each Socrata repo.

The `airbyte-github` plugin by default imports 163 tables into Splitgraph, but we only need a few of them for the analytics queries we want to make in the demo app. So, hardcode the list of those, but also hardcode the list of all 163 tables for reference, and also the 43 tables that are imported given the relevant tables (because either they depend on them via a foreign key relationship, or they're an airbyte meta table). For the 43 tables, see this recent import of `splitgraph/seafowl`: * https://www.splitgraph.com/miles/github-import-splitgraph-seafowl/20230526-224723/-/tables This took 3 minutes and 40 seconds to import into Splitgraph.

…umed across page loads Keep track of the current stepper state (e.g. taskId, import completion, etc.) in the URL. Update the URL when the state changes, and initialize the state from the URL on page load. Note that we need to default to an "uninitialized" state, and then update the state from the URL via an `initialize_from_url` action, because the `useRouter` hook is ansynchronous, and we don't look at query parameters on the server side with `getInitialProps` or similar. Thus we can show a loading bar before showing the import form (or whatever we're showing based on the current state). This makes development easier, since after a long import we can refresh the page with the URL containing the task ID and start from there, rather than re-importing every time. And it also makes it easier for users who can refresh the page without losing progress if an import has already started (it will just poll the taskId from the URL).

Export queries to tables `monthly_user_stats` and `monthly_issue_stats` in the same schema/namespace as the tables. We also export the tables, or at least the few that we explicitly asked to import.

After an import/export has completed, insert a row into the meta table, which we will also use to fetch the previously imported repositories from the client side when rendering the sidebar. We don't have transactional guarantees on the DDN, so we can't do `INSERT ON CONFLICT`, so instead we avoid duplicate rows by first selecting the existing row, and returning `204` if it's already been inserted into the `completed_repositories` table. However, I did notice that when I inserted the same row twice, it only showed up once when I made a selection in the Console. I don't know if this was due to a race condition, a bug, or because it's using the entire row as a compound primary key and for some reason requiring that it be unique.

…he charts The sidebar queries the DDN from the client-side with `useSql` from `@madatdata/react`, using the default anonymous (thus read-only) credential to query the "metadata table" that includes the list of repositories that have had a succesful import, and it links to a page for each one, which is currently a stub but where we will show the chart(s) with Observable Plot.

…-export-seafowl`

…rs, etc.)

…raph, or Seafowl

…pleted)

…state (force SSR)

…atting of time elapsed message

…istles, etc. Render each export(able|ed) query/table in a Splitgraph embed, with a tabbed conainer for switching to a Seafowl embed when it's ready, i.e. show each table/query individually, inline with its loading state.

…tead of each component Previously, the API always returned a unique taskId for each table being exported, but a recent change optimized it to return one taskId for the set of tables being exported, but still one taskId for each query being exported. Also previously, this demo code rendered a loading component for each table, and each component had its own hook for polling the taskId of that table. But now that multiple tables can share a taskId, it doesn't make sense for each component to poll for its own taskId. Now, we track the set of taskIds separately from the set of completed tables, and we only poll for unique taskIds, which we do in a hook instead of in each component. And each table preview checks the set of completed tables to know whether it's been completed.

For each query to export, optionally provide a fallback `CREATE TABLE` query which will run if the export job for the query fails. Implement this by calling an API route `/api/create-fallback-table-after-failed-export` after an export for a query fails for any reason. This works around the bug where queries with an empty result fail to export to Seafowl, see: splitgraph/seafowl#423

…tinationTable The point of exporting a query from Splitgraph to Seafowl is that once the result is in Seafowl, we can just select from the destinationTable and forget about the original query (which might not even be compatible with Seafowl). So make sure that when we're embedding an exported query, we only render the query in the embedded Splitgraph query editor, and for the embedded Seafowl Console, we render a query that simply selects from the destinationTable.

…epo page

…enough

…wn by reaction type

Migrate pull request from: splitgraph/madatdata#21 into its own repo, using `git-filter-repo` to include only commits from subdirectory `examples/nextjs-import-airbyte-github-export-seafowl/` ref: https://github.com/newren/git-filter-repo This commit adds the Yarn files necessary for running the example in an isolated repo (as opposed to as one of multiple examples in a shared multi-workspace `examples`), points the dependencies to `canary` versions (which reflect versions in splitgraph/madatdata#20), and also updates the readme with information for running in local development.

milesrichardson · 2023-08-02T01:43:34Z

This PR has been filtered into its own repo: https://github.com/splitgraph/demo-github-analytics

The demo is deployed to: https://demo-github-analytics.vercel.app/

(It's still a bit fragile - don't try to import a big repository with lots of issues/commits, since it will trigger a multi-hour ingestion job...)

milesrichardson force-pushed the example-nextjs-import-airbyte-github-export-seafowl branch from 6369655 to 301cb59 Compare June 9, 2023 23:11

milesrichardson force-pushed the example-nextjs-import-airbyte-github-export-seafowl branch from f171fa4 to 140f865 Compare June 23, 2023 03:21

milesrichardson added 28 commits June 29, 2023 03:05

Add stub examples/nextjs-import-airbyte-github-export-seafowl/

0f64a81

Stub out layout and sidebar of GitHub analytics example

776e11f

Assisted by the one and only GPT-4

Add backend config and API routes for starting, awaiting import task

1f3b6d1

Move lib-backend to lib/backend

8ba4151

Move lib/config.ts -> lib/config/github-tables.ts

9e47b5a

Export analytics queries to Seafowl in addition to tables

d60cd4d

Export queries to tables `monthly_user_stats` and `monthly_issue_stats` in the same schema/namespace as the tables. We also export the tables, or at least the few that we explicitly asked to import.

Support ?debug=1 parameter in URL of stepper to render DebugPane

3ba9e97

Install @observablehq/plot in `example-nextjs-import-airbyte-github…

613a774

…-export-seafowl`

Render stargazers line chart with Observable Plot querying Seafowl

309203d

Refactor Observable plot: Add useSqlPlot hook and make file per plot

9b58189

Refactor: use better name than acc in reduce function

5f4d6b5

Add styling, bells and whistles to stepper (text, buttons, loading ba…

b362bc2

…rs, etc.)

Refactor

4454199

Display preview table for each loading/completed: either embed Splitg…

cb71604

…raph, or Seafowl

Allow toggling between Splitgraph/Seafowl embeds (once export has com…

cc8cde1

…pleted)

Use <a> tag for "Import Your Repository" link, to completely reset …

2b51141

…state (force SSR)

Add formatTimeElapsed function prop to LoadingBar for custom form…

f43d0e8

…atting of time elapsed message

Refactor

ecd1e43

milesrichardson added 7 commits June 29, 2023 03:05

Move embedded preview components to be shared with export panel and r…

4cc3e41

…epo page

Add reduceRows method to useSqlPlot for case where mapping isn't …

ab57659

…enough

Add stacked bar chart of issue reactions by month with bars broken do…

15d14e2

…wn by reaction type

Make repo page have three tabs: tables, queries and charts

06c57b9

Rename chart to IssueReactsByMonth

93022b1

Bump GitHub import page size to 100, start date to 2023-01-01

afa4c07

milesrichardson force-pushed the example-nextjs-import-airbyte-github-export-seafowl branch from 43a9de7 to afa4c07 Compare June 29, 2023 02:27

Add scatter plot of user comment length vs. lines of code

06d94f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Example nextjs site that imports repositories from GitHub to Splitgraph, then exports them to Seafowl, then renders a chart of stargazers #21

Draft: Example nextjs site that imports repositories from GitHub to Splitgraph, then exports them to Seafowl, then renders a chart of stargazers #21

milesrichardson commented May 26, 2023

milesrichardson commented Aug 2, 2023

Draft: Example nextjs site that imports repositories from GitHub to Splitgraph, then exports them to Seafowl, then renders a chart of stargazers #21

Are you sure you want to change the base?

Draft: Example nextjs site that imports repositories from GitHub to Splitgraph, then exports them to Seafowl, then renders a chart of stargazers #21

Conversation

milesrichardson commented May 26, 2023

milesrichardson commented Aug 2, 2023