Update Get OSO data (#1736)

* Update index.mdx * Update api.md * Update query-data.mdx * Update python-notebooks.md * Update 3rd-party.mdx * fix: run prettier across docs markdown --------- Co-authored-by: Raymond Cheng <[email protected]>
opensource-observer · Jul 1, 2024 · c38cc30 · c38cc30
1 parent 3ad9ecd
commit c38cc30
Show file tree

Hide file tree

Showing 10 changed files with 194 additions and 152 deletions.
diff --git a/apps/docs/docs/contribute/connect-data/dagster-config.mdx b/apps/docs/docs/contribute/connect-data/dagster-config.mdx
@@ -31,7 +31,6 @@ it may look like this:
 
 ![Dagster deployment](./dagster_deployments.png)
 
-
 ### Run it!
 
 If this is your first time adding an asset,
@@ -43,7 +42,7 @@ You can monitor all Dagster runs
 
 ![Dagster run example](./dagster_run.png)
 
-Dagster also provides 
+Dagster also provides
 [automation](https://docs.dagster.io/concepts/automation)
 to run jobs on a
 [schedule](https://docs.dagster.io/concepts/automation/schedules)

diff --git a/apps/docs/docs/contribute/index.mdx b/apps/docs/docs/contribute/index.mdx
@@ -8,50 +8,84 @@ There are a variety of ways you can contribute to OSO. This doc features some of
 :::
 
 <table>
-<thead>
-<tr>
-<th style={{textAlign: 'left'}}>Contribution Type</th>
-<th style={{textAlign: 'left'}}>GitHub Repo</th>
-<th style={{textAlign: 'left'}}>Description</th>
-<th style={{textAlign: 'left'}}>Type of Contributor</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td><a href="./project-data">Update Project Data</a></td>
-<td><a href="https://github.com/opensource-observer/oss-directory">oss-directory</a></td>
-<td>Add a new project or update info for an existing project.</td>
-<td>OSS Projects, Analysts, General Public</td>
-</tr>
-<tr>
-<td><a href="./connect-data/funding-data">Add Funding Data</a></td>
-<td><a href="https://github.com/opensource-observer/oss-funding">oss-funding</a></td>
-<td>Add to our database of OSS funding via CSV upload.</td>
-<td>OSS Funders, Analysts</td>
-</tr>
-<tr>
-<td><a href="./connect-data">Connect Your Data</a></td>
-<td><a href="https://github.com/opensource-observer/oso">oso</a></td>
-<td>Write a plugin or help us replicate your data in the OSO data warehouse.</td>
-<td>Data Engineers, Developers</td>
-</tr>
-<tr>
-<td><a href="./impact-models">Propose an Impact Data Model</a></td>
-<td><a href="https://github.com/opensource-observer/oso">oso</a></td>
-<td>Submit a dbt data model for tracking open source impact metrics.</td>
-<td>Data Scientists, Analysts</td>
-</tr>
-<tr>
-<td><a href="./share-insights">Share Insights</a></td>
-<td><a href="https://github.com/opensource-observer/insights">insights</a></td>
-<td>Contribute to our library of data visualizations and Jupyter notebooks.</td>
-<td>Data Scientists, Analysts</td>
-</tr>
-<tr>
-<td><a href="./challenges">Join a Data Challenge</a></td>
-<td><a href="https://github.com/opensource-observer/insights">insights</a></td>
-<td>Work on a specific data challenge and get paid for your contributions.</td>
-<td>Data Scientists, Analysts</td>
-</tr>
-</tbody>
+  <thead>
+    <tr>
+      <th style={{ textAlign: "left" }}>Contribution Type</th>
+      <th style={{ textAlign: "left" }}>GitHub Repo</th>
+      <th style={{ textAlign: "left" }}>Description</th>
+      <th style={{ textAlign: "left" }}>Type of Contributor</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>
+        <a href="./project-data">Update Project Data</a>
+      </td>
+      <td>
+        <a href="https://github.com/opensource-observer/oss-directory">
+          oss-directory
+        </a>
+      </td>
+      <td>Add a new project or update info for an existing project.</td>
+      <td>OSS Projects, Analysts, General Public</td>
+    </tr>
+    <tr>
+      <td>
+        <a href="./connect-data/funding-data">Add Funding Data</a>
+      </td>
+      <td>
+        <a href="https://github.com/opensource-observer/oss-funding">
+          oss-funding
+        </a>
+      </td>
+      <td>Add to our database of OSS funding via CSV upload.</td>
+      <td>OSS Funders, Analysts</td>
+    </tr>
+    <tr>
+      <td>
+        <a href="./connect-data">Connect Your Data</a>
+      </td>
+      <td>
+        <a href="https://github.com/opensource-observer/oso">oso</a>
+      </td>
+      <td>
+        Write a plugin or help us replicate your data in the OSO data warehouse.
+      </td>
+      <td>Data Engineers, Developers</td>
+    </tr>
+    <tr>
+      <td>
+        <a href="./impact-models">Propose an Impact Data Model</a>
+      </td>
+      <td>
+        <a href="https://github.com/opensource-observer/oso">oso</a>
+      </td>
+      <td>Submit a dbt data model for tracking open source impact metrics.</td>
+      <td>Data Scientists, Analysts</td>
+    </tr>
+    <tr>
+      <td>
+        <a href="./share-insights">Share Insights</a>
+      </td>
+      <td>
+        <a href="https://github.com/opensource-observer/insights">insights</a>
+      </td>
+      <td>
+        Contribute to our library of data visualizations and Jupyter notebooks.
+      </td>
+      <td>Data Scientists, Analysts</td>
+    </tr>
+    <tr>
+      <td>
+        <a href="./challenges">Join a Data Challenge</a>
+      </td>
+      <td>
+        <a href="https://github.com/opensource-observer/insights">insights</a>
+      </td>
+      <td>
+        Work on a specific data challenge and get paid for your contributions.
+      </td>
+      <td>Data Scientists, Analysts</td>
+    </tr>
+  </tbody>
 </table>
diff --git a/apps/docs/docs/get-started/index.mdx b/apps/docs/docs/get-started/index.mdx
@@ -8,17 +8,17 @@ import Button from "../../src/components/plasmic/Button";
 
 :::info
 There are two easy ways of accessing OSO datasets: through our GraphQL API
-and through our data warehouse on BigQuery. 
+and through our data warehouse on BigQuery.
 For live integrations, you'll want [API access](../integrate/api.md).
 For exploratory analysis and impact data science,
-it's best to go direct to the data warehouse. 
+it's best to go direct to the data warehouse.
 :::
 
 OSO's data warehouse is currently located in BigQuery on Google Cloud (GCP).
 Every data model is made publicly available by a BigQuery dataset.
 
 See our [data overview](../integrate/overview/index.mdx)
-for a full list of public data sets. 
+for a full list of public data sets.
 
 ## Sign up for Google Cloud
 
@@ -45,7 +45,7 @@ Finally, you will be brought to the admin console where you can create a new pro
 Feel free to name this GCP project anything you'd like.
 (Or you can simply leave the default project name 'My First Project'.)
 
-*Note: you won't be able to create a new project if you're not an administrator of your Google organization*
+_Note: you won't be able to create a new project if you're not an administrator of your Google organization_
 
 ![GCP Create](./gcp_create.png)
 
@@ -70,7 +70,9 @@ Click on the following link to subscribe to the OSO production dataset:
   size={"compact"}
   color={"blue"}
   target={"_blank"}
-  link={"https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae/listings/oso_data_pipeline_190187c6517"}
+  link={
+    "https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae/listings/oso_data_pipeline_190187c6517"
+  }
   children={"Subscribe on BigQuery"}
 />
 
@@ -84,7 +86,7 @@ Open a new tab by clicking on the `+` icon
 on the top right of the console to `Create SQL Query`.
 
 From here you will be able to write any SQL you'd like any OSO dataset.
-For example, you can query the `oso_production` dataset for 
+For example, you can query the `oso_production` dataset for
 all available collections like this:
 
 ```sql
@@ -102,12 +104,12 @@ The results will appear in a table at the bottom of the console.
 The console will help you complete your query as you type, and will also provide you with a preview of the results and computation time. You can save your queries, download the results, and even make simple visualizations directly from the console.
 
 :::tip
-To explore all the OSO datasets available, see the 
+To explore all the OSO datasets available, see the
 [Data Overview](../integrate/overview/index.mdx).
 
-- **oso\_production** contains all production data. This can be quite large depending on the dataset.
-- **oso\_playground** contains only the last 2 weeks for every dataset. We recommend using this for development and testing.
-:::
+- **oso_production** contains all production data. This can be quite large depending on the dataset.
+- **oso_playground** contains only the last 2 weeks for every dataset. We recommend using this for development and testing.
+  :::
 
 ## Next steps
 
@@ -119,10 +121,15 @@ Now that you're set up, there are many ways to contribute to OSO and integrate t
 - [Query the OSO API](../integrate/api.md) for metrics and impact vectors from your web app
 
 If you think you'll be an ongoing contributor to OSO,
-please apply to join the [Kariba Data Collective](https://www.kariba.network). 
+please apply to join the [Kariba Data Collective](https://www.kariba.network).
 
 Membership is free but we want to keep the community close-knit and mission-aligned.
 As the community grows, we want to reward the most useful contributions and
 in so doing create a [new job category for impact data science](https://docs.opensource.observer/blog/impact-data-scientists).
 
-<Link to="https://www.kariba.network" className="button button--secondary button--lg">Join the Data Collective</Link>
+<Link
+  to="https://www.kariba.network"
+  className="button button--secondary button--lg"
+>
+  Join the Data Collective
+</Link>
diff --git a/apps/docs/docs/how-oso-works/impact-metrics/index.mdx b/apps/docs/docs/how-oso-works/impact-metrics/index.mdx
@@ -3,9 +3,8 @@ title: Specification
 sidebar_position: 1
 ---
 
-import Tabs from '@theme/Tabs';
-import TabItem from '@theme/TabItem';
-
+import Tabs from "@theme/Tabs";
+import TabItem from "@theme/TabItem";
 
 :::info
 An **impact metric** is a quantitative measure of impact over a discrete period of time. Impact metrics are most commonly queried by project (eg, `uniswap`), although they can also be queried by individual artifact or at the collection level.
@@ -19,7 +18,7 @@ Impact metrics should be designed with the following principles in mind:
 
 - **Verifiability**: Metrics should be based on public data that can be independently verified. They should not rely on proprietary data sources or private APIs.
 - **Reproducibility**: Metrics should be easy to reproduce, simulate, and audit to ensure they are achieving the intended results. They should not have a "black box" element that makes them difficult to understand or replicate.
-- **Consistency**: Metrics should be consistent across projects and artifacts. They should be calculated using the same methodology and data sources to ensure that they are comparable.  
+- **Consistency**: Metrics should be consistent across projects and artifacts. They should be calculated using the same methodology and data sources to ensure that they are comparable.
 - **Completeness**: Metrics should be comprehensive and cover all projects and artifacts in the OSO database that fulfill basic requirements. They should not be highly sector-specific.
 - **Simplicity**: Metrics should have business logic that is easy to understand. They should not require a deep understanding of the underlying data or complex statistical methods to interpret.
 
@@ -37,6 +36,7 @@ An impact metric must be:
 ## Schema
 
 Every impact metric must include the following fields: `project_id`, `impact_metric`, and `amount`. For example:
+
 ```json
 {
   "project_id": "jUda1pi-FdNlaUmgKq51B4h8x4wX3QTN2fZkKq6N0vw\u003d",
@@ -47,7 +47,6 @@ Every impact metric must include the following fields: `project_id`, `impact_met
 
 Currently all intermediate metrics are calculated [here](https://github.com/opensource-observer/oso/tree/main/warehouse/dbt/models/intermediate/metrics) and consolidated metrics are available as metrics marts [here](https://github.com/opensource-observer/oso/tree/main/warehouse/dbt/models/marts/metrics).
 
-
 ## Sample Metrics
 
 ---
@@ -172,6 +171,7 @@ Here's a more complex impact metric that uses several CTEs to calculate the numb
       group by
         project_id
     ```
+
   </TabItem>
   <TabItem value="response" label="Response">
     ```json

diff --git a/apps/docs/docs/how-oso-works/index.mdx b/apps/docs/docs/how-oso-works/index.mdx
@@ -13,13 +13,13 @@ This section introduces some core concepts for understanding how OSO works and w
 
 OSO datasets are built from three primary registries: **collections**, **projects**, and **artifacts**. A **collection** is a group of projects. A **project** is a group of artifacts. An **artifact** is an open source work contribution that belongs to a project. These registries are completely public and maintained in [OSS Directory](https://github.com/opensource-observer/oss-directory).
 
-Critically, a project can belong to many collections but *an artifact may only belong to one project*. A new project is instantiated from a unique `name` and a GitHub URL that is not owned by any other project. These properties are validated whenever a collection, project, or artifact is added to OSS Directory.
+Critically, a project can belong to many collections but _an artifact may only belong to one project_. A new project is instantiated from a unique `name` and a GitHub URL that is not owned by any other project. These properties are validated whenever a collection, project, or artifact is added to OSS Directory.
 
 ![2_artifacts](./2_artifacts.png)
 
 After registering an initial set of artifacts, OSO looks downstream for additional artifacts to associate with a project.
 
-For example, when a project contains a **GitHub organization** artifact, OSO will index all repositories in that organization. When a project contains a blockchain **deployer address** artifact, OSO will index all contracts deployed by that address (including contracts deployed by factories deployed by that address). 
+For example, when a project contains a **GitHub organization** artifact, OSO will index all repositories in that organization. When a project contains a blockchain **deployer address** artifact, OSO will index all contracts deployed by that address (including contracts deployed by factories deployed by that address).
 
 ![3_events](./3_events.png)
 
@@ -29,4 +29,4 @@ For example, when a project contains a **GitHub organization** artifact, OSO wil
 
 ![4_impact_metrics](./4_impact_metrics.png)
 
-Finally, OSO aggregates this data to generate metrics and insights about the projects, collections, and artifacts in the OSS Directory. These metrics are available through the OSO website, API, and BigQuery.
+Finally, OSO aggregates this data to generate metrics and insights about the projects, collections, and artifacts in the OSS Directory. These metrics are available through the OSO website, API, and BigQuery.
diff --git a/apps/docs/docs/integrate/3rd-party.mdx b/apps/docs/docs/integrate/3rd-party.mdx
@@ -5,14 +5,12 @@ sidebar_position: 5
 
 import Button from "../../src/components/plasmic/Button";
 
-Because all OSO datasets and models are accessible as
-public datasets on BigQuery,
-connecting and exploring the data
+OSO datasets and models are public and can be accessed on BigQuery. This allows you to connect and explore the data using various tools.
 
 ## Subscribe to an OSO dataset
 
 First, we need to subscribe to an OSO dataset in your own
-Google Cloud account. 
+Google Cloud account.
 You can see all of our available datasets in the
 [Data Overview](./overview/index.mdx).
 
@@ -22,7 +20,9 @@ We recommend starting with the OSO production data pipeline here:
   size={"compact"}
   color={"blue"}
   target={"_blank"}
-  link={"https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae/listings/oso_data_pipeline_190187c6517"}
+  link={
+    "https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae/listings/oso_data_pipeline_190187c6517"
+  }
   children={"Subscribe on BigQuery"}
 />
 
@@ -43,7 +43,7 @@ check out a specific guide:
 
 For the rest of this guide, we'll use Hex as a running example.
 
-First, you'll need to 
+First, you'll need to
 [create a service account](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating)
 in GCP and download the JSON key file.
 Click [here](https://console.cloud.google.com/iam-admin/serviceaccounts?walkthrough_id=iam--create-service-account-keys&start_index=1#step_index=1)
@@ -53,7 +53,7 @@ Click "+ Create Service Account":
 
 ![create service account](./gcp_service_account_create.png)
 
-Grant this new service account the "BigQuery User" 
+Grant this new service account the "BigQuery User"
 and "BigQuery Data Viewer" roles:
 
 ![permission service account](./gcp_service_account_perm.png)
@@ -85,10 +85,9 @@ Now try running a query on the OSO dataset!
 
 ![Hex query](./hex_query.png)
 
-
 ## Share your work!
 
 Open Source Observer is a public good for lifting
 the collective intelligence of networks.
 Please share your insights and tag us!
-We love to amplify great insights.
+We love to amplify great insights.
diff --git a/apps/docs/docs/integrate/api.md b/apps/docs/docs/integrate/api.md
@@ -10,9 +10,9 @@ For data exploration, check out the guides on
 [performing queries](./query-data.mdx)
 and [Python notebooks](./python-notebooks.md).
 
-## Generate an API key
+## How to Generate an API key
 
-First, navigate to [www.opensource.observer](https://www.opensource.observer) and create a new account.
+First, go to [www.opensource.observer](https://www.opensource.observer) and create a new account.
 
 If you already have an account, log in. Then create a new personal API key:
 
@@ -22,7 +22,7 @@ If you already have an account, log in. Then create a new personal API key:
 4. You should see your brand new key. **Immediately** save this value, as you'll **never** see it again after refreshing the page.
 5. Click "Create" to save the key.
 
-You can create as many keys as you like.
+**You can create as many keys as you like.**
 
 ![generate API key](./generate-api-key.png)
 
@@ -38,7 +38,7 @@ You can navigate to our
 [public GraphQL explorer](https://cloud.hasura.io/public/graphiql?endpoint=https://opensource-observer.hasura.app/v1/graphql)
 to explore the schema and execute test queries.
 
-## Authentication
+## How to Authenticate
 
 In order to authenticate with the API service, you have to use the `Authorization` HTTP header and `Bearer` authentication on all HTTP requests, like so: