Skip to content

Commit

Permalink
Update Get OSO data (#1736)
Browse files Browse the repository at this point in the history
* Update index.mdx

* Update api.md

* Update query-data.mdx

* Update python-notebooks.md

* Update 3rd-party.mdx

* fix: run prettier across docs markdown

---------

Co-authored-by: Raymond Cheng <[email protected]>
  • Loading branch information
karibaEA and ryscheng authored Jul 1, 2024
1 parent 3ad9ecd commit c38cc30
Show file tree
Hide file tree
Showing 10 changed files with 194 additions and 152 deletions.
3 changes: 1 addition & 2 deletions apps/docs/docs/contribute/connect-data/dagster-config.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ it may look like this:

![Dagster deployment](./dagster_deployments.png)


### Run it!

If this is your first time adding an asset,
Expand All @@ -43,7 +42,7 @@ You can monitor all Dagster runs

![Dagster run example](./dagster_run.png)

Dagster also provides
Dagster also provides
[automation](https://docs.dagster.io/concepts/automation)
to run jobs on a
[schedule](https://docs.dagster.io/concepts/automation/schedules)
Expand Down
126 changes: 80 additions & 46 deletions apps/docs/docs/contribute/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,50 +8,84 @@ There are a variety of ways you can contribute to OSO. This doc features some of
:::

<table>
<thead>
<tr>
<th style={{textAlign: 'left'}}>Contribution Type</th>
<th style={{textAlign: 'left'}}>GitHub Repo</th>
<th style={{textAlign: 'left'}}>Description</th>
<th style={{textAlign: 'left'}}>Type of Contributor</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="./project-data">Update Project Data</a></td>
<td><a href="https://github.com/opensource-observer/oss-directory">oss-directory</a></td>
<td>Add a new project or update info for an existing project.</td>
<td>OSS Projects, Analysts, General Public</td>
</tr>
<tr>
<td><a href="./connect-data/funding-data">Add Funding Data</a></td>
<td><a href="https://github.com/opensource-observer/oss-funding">oss-funding</a></td>
<td>Add to our database of OSS funding via CSV upload.</td>
<td>OSS Funders, Analysts</td>
</tr>
<tr>
<td><a href="./connect-data">Connect Your Data</a></td>
<td><a href="https://github.com/opensource-observer/oso">oso</a></td>
<td>Write a plugin or help us replicate your data in the OSO data warehouse.</td>
<td>Data Engineers, Developers</td>
</tr>
<tr>
<td><a href="./impact-models">Propose an Impact Data Model</a></td>
<td><a href="https://github.com/opensource-observer/oso">oso</a></td>
<td>Submit a dbt data model for tracking open source impact metrics.</td>
<td>Data Scientists, Analysts</td>
</tr>
<tr>
<td><a href="./share-insights">Share Insights</a></td>
<td><a href="https://github.com/opensource-observer/insights">insights</a></td>
<td>Contribute to our library of data visualizations and Jupyter notebooks.</td>
<td>Data Scientists, Analysts</td>
</tr>
<tr>
<td><a href="./challenges">Join a Data Challenge</a></td>
<td><a href="https://github.com/opensource-observer/insights">insights</a></td>
<td>Work on a specific data challenge and get paid for your contributions.</td>
<td>Data Scientists, Analysts</td>
</tr>
</tbody>
<thead>
<tr>
<th style={{ textAlign: "left" }}>Contribution Type</th>
<th style={{ textAlign: "left" }}>GitHub Repo</th>
<th style={{ textAlign: "left" }}>Description</th>
<th style={{ textAlign: "left" }}>Type of Contributor</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<a href="./project-data">Update Project Data</a>
</td>
<td>
<a href="https://github.com/opensource-observer/oss-directory">
oss-directory
</a>
</td>
<td>Add a new project or update info for an existing project.</td>
<td>OSS Projects, Analysts, General Public</td>
</tr>
<tr>
<td>
<a href="./connect-data/funding-data">Add Funding Data</a>
</td>
<td>
<a href="https://github.com/opensource-observer/oss-funding">
oss-funding
</a>
</td>
<td>Add to our database of OSS funding via CSV upload.</td>
<td>OSS Funders, Analysts</td>
</tr>
<tr>
<td>
<a href="./connect-data">Connect Your Data</a>
</td>
<td>
<a href="https://github.com/opensource-observer/oso">oso</a>
</td>
<td>
Write a plugin or help us replicate your data in the OSO data warehouse.
</td>
<td>Data Engineers, Developers</td>
</tr>
<tr>
<td>
<a href="./impact-models">Propose an Impact Data Model</a>
</td>
<td>
<a href="https://github.com/opensource-observer/oso">oso</a>
</td>
<td>Submit a dbt data model for tracking open source impact metrics.</td>
<td>Data Scientists, Analysts</td>
</tr>
<tr>
<td>
<a href="./share-insights">Share Insights</a>
</td>
<td>
<a href="https://github.com/opensource-observer/insights">insights</a>
</td>
<td>
Contribute to our library of data visualizations and Jupyter notebooks.
</td>
<td>Data Scientists, Analysts</td>
</tr>
<tr>
<td>
<a href="./challenges">Join a Data Challenge</a>
</td>
<td>
<a href="https://github.com/opensource-observer/insights">insights</a>
</td>
<td>
Work on a specific data challenge and get paid for your contributions.
</td>
<td>Data Scientists, Analysts</td>
</tr>
</tbody>
</table>
31 changes: 19 additions & 12 deletions apps/docs/docs/get-started/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,17 @@ import Button from "../../src/components/plasmic/Button";

:::info
There are two easy ways of accessing OSO datasets: through our GraphQL API
and through our data warehouse on BigQuery.
and through our data warehouse on BigQuery.
For live integrations, you'll want [API access](../integrate/api.md).
For exploratory analysis and impact data science,
it's best to go direct to the data warehouse.
it's best to go direct to the data warehouse.
:::

OSO's data warehouse is currently located in BigQuery on Google Cloud (GCP).
Every data model is made publicly available by a BigQuery dataset.

See our [data overview](../integrate/overview/index.mdx)
for a full list of public data sets.
for a full list of public data sets.

## Sign up for Google Cloud

Expand All @@ -45,7 +45,7 @@ Finally, you will be brought to the admin console where you can create a new pro
Feel free to name this GCP project anything you'd like.
(Or you can simply leave the default project name 'My First Project'.)

*Note: you won't be able to create a new project if you're not an administrator of your Google organization*
_Note: you won't be able to create a new project if you're not an administrator of your Google organization_

![GCP Create](./gcp_create.png)

Expand All @@ -70,7 +70,9 @@ Click on the following link to subscribe to the OSO production dataset:
size={"compact"}
color={"blue"}
target={"_blank"}
link={"https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae/listings/oso_data_pipeline_190187c6517"}
link={
"https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae/listings/oso_data_pipeline_190187c6517"
}
children={"Subscribe on BigQuery"}
/>

Expand All @@ -84,7 +86,7 @@ Open a new tab by clicking on the `+` icon
on the top right of the console to `Create SQL Query`.

From here you will be able to write any SQL you'd like any OSO dataset.
For example, you can query the `oso_production` dataset for
For example, you can query the `oso_production` dataset for
all available collections like this:

```sql
Expand All @@ -102,12 +104,12 @@ The results will appear in a table at the bottom of the console.
The console will help you complete your query as you type, and will also provide you with a preview of the results and computation time. You can save your queries, download the results, and even make simple visualizations directly from the console.

:::tip
To explore all the OSO datasets available, see the
To explore all the OSO datasets available, see the
[Data Overview](../integrate/overview/index.mdx).

- **oso\_production** contains all production data. This can be quite large depending on the dataset.
- **oso\_playground** contains only the last 2 weeks for every dataset. We recommend using this for development and testing.
:::
- **oso_production** contains all production data. This can be quite large depending on the dataset.
- **oso_playground** contains only the last 2 weeks for every dataset. We recommend using this for development and testing.
:::

## Next steps

Expand All @@ -119,10 +121,15 @@ Now that you're set up, there are many ways to contribute to OSO and integrate t
- [Query the OSO API](../integrate/api.md) for metrics and impact vectors from your web app

If you think you'll be an ongoing contributor to OSO,
please apply to join the [Kariba Data Collective](https://www.kariba.network).
please apply to join the [Kariba Data Collective](https://www.kariba.network).

Membership is free but we want to keep the community close-knit and mission-aligned.
As the community grows, we want to reward the most useful contributions and
in so doing create a [new job category for impact data science](https://docs.opensource.observer/blog/impact-data-scientists).

<Link to="https://www.kariba.network" className="button button--secondary button--lg">Join the Data Collective</Link>
<Link
to="https://www.kariba.network"
className="button button--secondary button--lg"
>
Join the Data Collective
</Link>
10 changes: 5 additions & 5 deletions apps/docs/docs/how-oso-works/impact-metrics/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@ title: Specification
sidebar_position: 1
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

:::info
An **impact metric** is a quantitative measure of impact over a discrete period of time. Impact metrics are most commonly queried by project (eg, `uniswap`), although they can also be queried by individual artifact or at the collection level.
Expand All @@ -19,7 +18,7 @@ Impact metrics should be designed with the following principles in mind:

- **Verifiability**: Metrics should be based on public data that can be independently verified. They should not rely on proprietary data sources or private APIs.
- **Reproducibility**: Metrics should be easy to reproduce, simulate, and audit to ensure they are achieving the intended results. They should not have a "black box" element that makes them difficult to understand or replicate.
- **Consistency**: Metrics should be consistent across projects and artifacts. They should be calculated using the same methodology and data sources to ensure that they are comparable.
- **Consistency**: Metrics should be consistent across projects and artifacts. They should be calculated using the same methodology and data sources to ensure that they are comparable.
- **Completeness**: Metrics should be comprehensive and cover all projects and artifacts in the OSO database that fulfill basic requirements. They should not be highly sector-specific.
- **Simplicity**: Metrics should have business logic that is easy to understand. They should not require a deep understanding of the underlying data or complex statistical methods to interpret.

Expand All @@ -37,6 +36,7 @@ An impact metric must be:
## Schema

Every impact metric must include the following fields: `project_id`, `impact_metric`, and `amount`. For example:

```json
{
"project_id": "jUda1pi-FdNlaUmgKq51B4h8x4wX3QTN2fZkKq6N0vw\u003d",
Expand All @@ -47,7 +47,6 @@ Every impact metric must include the following fields: `project_id`, `impact_met

Currently all intermediate metrics are calculated [here](https://github.com/opensource-observer/oso/tree/main/warehouse/dbt/models/intermediate/metrics) and consolidated metrics are available as metrics marts [here](https://github.com/opensource-observer/oso/tree/main/warehouse/dbt/models/marts/metrics).


## Sample Metrics

---
Expand Down Expand Up @@ -172,6 +171,7 @@ Here's a more complex impact metric that uses several CTEs to calculate the numb
group by
project_id
```

</TabItem>
<TabItem value="response" label="Response">
```json
Expand Down
6 changes: 3 additions & 3 deletions apps/docs/docs/how-oso-works/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ This section introduces some core concepts for understanding how OSO works and w

OSO datasets are built from three primary registries: **collections**, **projects**, and **artifacts**. A **collection** is a group of projects. A **project** is a group of artifacts. An **artifact** is an open source work contribution that belongs to a project. These registries are completely public and maintained in [OSS Directory](https://github.com/opensource-observer/oss-directory).

Critically, a project can belong to many collections but *an artifact may only belong to one project*. A new project is instantiated from a unique `name` and a GitHub URL that is not owned by any other project. These properties are validated whenever a collection, project, or artifact is added to OSS Directory.
Critically, a project can belong to many collections but _an artifact may only belong to one project_. A new project is instantiated from a unique `name` and a GitHub URL that is not owned by any other project. These properties are validated whenever a collection, project, or artifact is added to OSS Directory.

![2_artifacts](./2_artifacts.png)

After registering an initial set of artifacts, OSO looks downstream for additional artifacts to associate with a project.

For example, when a project contains a **GitHub organization** artifact, OSO will index all repositories in that organization. When a project contains a blockchain **deployer address** artifact, OSO will index all contracts deployed by that address (including contracts deployed by factories deployed by that address).
For example, when a project contains a **GitHub organization** artifact, OSO will index all repositories in that organization. When a project contains a blockchain **deployer address** artifact, OSO will index all contracts deployed by that address (including contracts deployed by factories deployed by that address).

![3_events](./3_events.png)

Expand All @@ -29,4 +29,4 @@ For example, when a project contains a **GitHub organization** artifact, OSO wil

![4_impact_metrics](./4_impact_metrics.png)

Finally, OSO aggregates this data to generate metrics and insights about the projects, collections, and artifacts in the OSS Directory. These metrics are available through the OSO website, API, and BigQuery.
Finally, OSO aggregates this data to generate metrics and insights about the projects, collections, and artifacts in the OSS Directory. These metrics are available through the OSO website, API, and BigQuery.
17 changes: 8 additions & 9 deletions apps/docs/docs/integrate/3rd-party.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,12 @@ sidebar_position: 5

import Button from "../../src/components/plasmic/Button";

Because all OSO datasets and models are accessible as
public datasets on BigQuery,
connecting and exploring the data
OSO datasets and models are public and can be accessed on BigQuery. This allows you to connect and explore the data using various tools.

## Subscribe to an OSO dataset

First, we need to subscribe to an OSO dataset in your own
Google Cloud account.
Google Cloud account.
You can see all of our available datasets in the
[Data Overview](./overview/index.mdx).

Expand All @@ -22,7 +20,9 @@ We recommend starting with the OSO production data pipeline here:
size={"compact"}
color={"blue"}
target={"_blank"}
link={"https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae/listings/oso_data_pipeline_190187c6517"}
link={
"https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae/listings/oso_data_pipeline_190187c6517"
}
children={"Subscribe on BigQuery"}
/>

Expand All @@ -43,7 +43,7 @@ check out a specific guide:

For the rest of this guide, we'll use Hex as a running example.

First, you'll need to
First, you'll need to
[create a service account](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating)
in GCP and download the JSON key file.
Click [here](https://console.cloud.google.com/iam-admin/serviceaccounts?walkthrough_id=iam--create-service-account-keys&start_index=1#step_index=1)
Expand All @@ -53,7 +53,7 @@ Click "+ Create Service Account":

![create service account](./gcp_service_account_create.png)

Grant this new service account the "BigQuery User"
Grant this new service account the "BigQuery User"
and "BigQuery Data Viewer" roles:

![permission service account](./gcp_service_account_perm.png)
Expand Down Expand Up @@ -85,10 +85,9 @@ Now try running a query on the OSO dataset!

![Hex query](./hex_query.png)


## Share your work!

Open Source Observer is a public good for lifting
the collective intelligence of networks.
Please share your insights and tag us!
We love to amplify great insights.
We love to amplify great insights.
8 changes: 4 additions & 4 deletions apps/docs/docs/integrate/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ For data exploration, check out the guides on
[performing queries](./query-data.mdx)
and [Python notebooks](./python-notebooks.md).

## Generate an API key
## How to Generate an API key

First, navigate to [www.opensource.observer](https://www.opensource.observer) and create a new account.
First, go to [www.opensource.observer](https://www.opensource.observer) and create a new account.

If you already have an account, log in. Then create a new personal API key:

Expand All @@ -22,7 +22,7 @@ If you already have an account, log in. Then create a new personal API key:
4. You should see your brand new key. **Immediately** save this value, as you'll **never** see it again after refreshing the page.
5. Click "Create" to save the key.

You can create as many keys as you like.
**You can create as many keys as you like.**

![generate API key](./generate-api-key.png)

Expand All @@ -38,7 +38,7 @@ You can navigate to our
[public GraphQL explorer](https://cloud.hasura.io/public/graphiql?endpoint=https://opensource-observer.hasura.app/v1/graphql)
to explore the schema and execute test queries.

## Authentication
## How to Authenticate

In order to authenticate with the API service, you have to use the `Authorization` HTTP header and `Bearer` authentication on all HTTP requests, like so:

Expand Down
Loading

0 comments on commit c38cc30

Please sign in to comment.