-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: Example nextjs site that imports repositories from GitHub to Splitgraph, then exports them to Seafowl, then renders a chart of stargazers #21
Open
milesrichardson
wants to merge
36
commits into
generated-import-plugins
Choose a base branch
from
example-nextjs-import-airbyte-github-export-seafowl
base: generated-import-plugins
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
milesrichardson
force-pushed
the
example-nextjs-import-airbyte-github-export-seafowl
branch
from
June 9, 2023 23:11
6369655
to
301cb59
Compare
milesrichardson
force-pushed
the
example-nextjs-import-airbyte-github-export-seafowl
branch
from
June 23, 2023 03:21
f171fa4
to
140f865
Compare
Assisted by the one and only GPT-4
Implement the import panel and stub out the export panel, using a single Stepper component and a react context with a reducer for managing the state. Implement the fetch requests to start the import, and also to await the import. Co-Authored by GPT-4 ;)
…ort-to-seafowl-task` The `start-export-to-seafowl` route takes a list of source tables from Splitgraph (list of `{namespace,repository,table}`), and starts a task to export them to Seafowl. It returns a list of objects `{taskId: string; tableName: string;}`, where each item represents the currently exporting table (and `tableName` is the source table name). The `await-export-to-seafowl-task` route takes a single `taskId` parameter and returns its status, i.e. `{completed: boolean; ...otherInfo}`
The ExportPanel first renders a "Start Export" button. Then, while the export is running, it renders an `ExportTableLoadingBar` for each table that is being exported. Each of thee individual components sends its own polling request with its `taskId` to the `await-export-to-seafowl-task` endpoint, and upon completion of each task, sends an action to the reducer, which handles it by updating the set of loading tasks. When the set of loading tasks is complete, it changes the `stepperState` to `export_complete`. If any of the tasks has an error, then the `stepperState` changes to `export_error` which should cause all loading bars to unmount - i.e., any error will short-circuit all of the table loading, even if some were to complete. At that point the user can click "start export" again. This completes the logic necessary for import and export, and now it's just a matter of styling the components, linking to the Splitgraph Console, adding explanatory text, and finally rendering a chart with the data. We'll also want to create a meta-repository in Splitgraph for tracking which GitHub repos we've imported so far, analogously to how we track Socrata metadata for each Socrata repo.
The `airbyte-github` plugin by default imports 163 tables into Splitgraph, but we only need a few of them for the analytics queries we want to make in the demo app. So, hardcode the list of those, but also hardcode the list of all 163 tables for reference, and also the 43 tables that are imported given the relevant tables (because either they depend on them via a foreign key relationship, or they're an airbyte meta table). For the 43 tables, see this recent import of `splitgraph/seafowl`: * https://www.splitgraph.com/miles/github-import-splitgraph-seafowl/20230526-224723/-/tables This took 3 minutes and 40 seconds to import into Splitgraph.
…umed across page loads Keep track of the current stepper state (e.g. taskId, import completion, etc.) in the URL. Update the URL when the state changes, and initialize the state from the URL on page load. Note that we need to default to an "uninitialized" state, and then update the state from the URL via an `initialize_from_url` action, because the `useRouter` hook is ansynchronous, and we don't look at query parameters on the server side with `getInitialProps` or similar. Thus we can show a loading bar before showing the import form (or whatever we're showing based on the current state). This makes development easier, since after a long import we can refresh the page with the URL containing the task ID and start from there, rather than re-importing every time. And it also makes it easier for users who can refresh the page without losing progress if an import has already started (it will just poll the taskId from the URL).
Export queries to tables `monthly_user_stats` and `monthly_issue_stats` in the same schema/namespace as the tables. We also export the tables, or at least the few that we explicitly asked to import.
After an import/export has completed, insert a row into the meta table, which we will also use to fetch the previously imported repositories from the client side when rendering the sidebar. We don't have transactional guarantees on the DDN, so we can't do `INSERT ON CONFLICT`, so instead we avoid duplicate rows by first selecting the existing row, and returning `204` if it's already been inserted into the `completed_repositories` table. However, I did notice that when I inserted the same row twice, it only showed up once when I made a selection in the Console. I don't know if this was due to a race condition, a bug, or because it's using the entire row as a compound primary key and for some reason requiring that it be unique.
…he charts The sidebar queries the DDN from the client-side with `useSql` from `@madatdata/react`, using the default anonymous (thus read-only) credential to query the "metadata table" that includes the list of repositories that have had a succesful import, and it links to a page for each one, which is currently a stub but where we will show the chart(s) with Observable Plot.
…-export-seafowl`
…state (force SSR)
…atting of time elapsed message
…istles, etc. Render each export(able|ed) query/table in a Splitgraph embed, with a tabbed conainer for switching to a Seafowl embed when it's ready, i.e. show each table/query individually, inline with its loading state.
…tead of each component Previously, the API always returned a unique taskId for each table being exported, but a recent change optimized it to return one taskId for the set of tables being exported, but still one taskId for each query being exported. Also previously, this demo code rendered a loading component for each table, and each component had its own hook for polling the taskId of that table. But now that multiple tables can share a taskId, it doesn't make sense for each component to poll for its own taskId. Now, we track the set of taskIds separately from the set of completed tables, and we only poll for unique taskIds, which we do in a hook instead of in each component. And each table preview checks the set of completed tables to know whether it's been completed.
For each query to export, optionally provide a fallback `CREATE TABLE` query which will run if the export job for the query fails. Implement this by calling an API route `/api/create-fallback-table-after-failed-export` after an export for a query fails for any reason. This works around the bug where queries with an empty result fail to export to Seafowl, see: splitgraph/seafowl#423
…tinationTable The point of exporting a query from Splitgraph to Seafowl is that once the result is in Seafowl, we can just select from the destinationTable and forget about the original query (which might not even be compatible with Seafowl). So make sure that when we're embedding an exported query, we only render the query in the embedded Splitgraph query editor, and for the embedded Seafowl Console, we render a query that simply selects from the destinationTable.
…wn by reaction type
milesrichardson
force-pushed
the
example-nextjs-import-airbyte-github-export-seafowl
branch
from
June 29, 2023 02:27
43a9de7
to
afa4c07
Compare
milesrichardson
added a commit
to splitgraph/demo-github-analytics
that referenced
this pull request
Aug 2, 2023
Migrate pull request from: splitgraph/madatdata#21 into its own repo, using `git-filter-repo` to include only commits from subdirectory `examples/nextjs-import-airbyte-github-export-seafowl/` ref: https://github.com/newren/git-filter-repo This commit adds the Yarn files necessary for running the example in an isolated repo (as opposed to as one of multiple examples in a shared multi-workspace `examples`), points the dependencies to `canary` versions (which reflect versions in splitgraph/madatdata#20), and also updates the readme with information for running in local development.
This PR has been filtered into its own repo: https://github.com/splitgraph/demo-github-analytics The demo is deployed to: https://demo-github-analytics.vercel.app/ (It's still a bit fragile - don't try to import a big repository with lots of issues/commits, since it will trigger a multi-hour ingestion job...) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.