Skip to content

Commit

Permalink
docs: Add new data sources to Data Overview (#1752)
Browse files Browse the repository at this point in the history
* Gitcoin, OpenRank, Farcaster, Lens are now on the Data Exchange.
* Update the Data Overview to showcase these datasets
* Minor text edits
  • Loading branch information
ryscheng authored Jul 3, 2024
1 parent cb34ee5 commit e584a16
Show file tree
Hide file tree
Showing 5 changed files with 162 additions and 51 deletions.
8 changes: 4 additions & 4 deletions apps/docs/docs/get-started/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -115,10 +115,10 @@ To explore all the OSO datasets available, see the

Now that you're set up, there are many ways to contribute to OSO and integrate the data with your application:

- [SQL Query Guide](../integrate/query-data.mdx)
- [Write Python notebooks](../integrate/python-notebooks.md)
- [Propose an impact model](../contribute/impact-models.md) to run in our data pipeline
- [Query the OSO API](../integrate/api.md) for metrics and impact vectors from your web app
- [SQL Query Guide](../integrate/query-data.mdx) for more details on running effective queries
- [Write Python notebooks](../integrate/python-notebooks.md) for advanced analysis and visualizations
- [Propose an impact model](../contribute/impact-models.md) to run in the OSO data pipeline
- [Query the OSO API](../integrate/api.md) for impact metrics and project info to integrate into your application

If you think you'll be an ongoing contributor to OSO,
please apply to join the [Kariba Data Collective](https://www.kariba.network).
Expand Down
5 changes: 4 additions & 1 deletion apps/docs/docs/integrate/3rd-party.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,10 @@ We recommend starting with the OSO production data pipeline here:
## Connect your third party tool

BigQuery has built-in support in many BI, notebook, and
data analysis tools. To see how to connect to a specific tool,
data analysis tools. These tools typically offer many visualization
and exploration tools well-beyond what you can query in the BigQuery Studio.

To see how to connect to a specific tool,
check out a specific guide:

- [Tableau](https://cloud.google.com/bigquery/docs/analyze-data-tableau)
Expand Down
8 changes: 4 additions & 4 deletions apps/docs/docs/integrate/api.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
title: Use the GraphQL API
sidebar_position: 2
sidebar_position: 10
---

The OSO API currently only allows read-only GraphQL queries against OSO mart models
(e.g. impact metrics, project info).
The OSO API currently only allows read-only GraphQL queries against a subset
of OSO data (i.e. only mart models like impact metrics, project info).
This API should only be used to fetch data to integrate into a live application in production.
For data exploration, check out the guides on
If you want access to the full dataset for data exploration, check out the guides on
[performing queries](./query-data.mdx)
and [Python notebooks](./python-notebooks.md).

Expand Down
2 changes: 1 addition & 1 deletion apps/docs/docs/integrate/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ That means all source code, data, and infrastructure is publicly available for u

- [**Get Started**](../get-started/index.mdx): to setup your Google account for data access and run your first query
- [**Data Overview**](./overview/index.mdx): for an overview of all data available
- [**API access**](./api.md): to integrate OSO metrics into a live production application
- [**SQL Query Guide**](./query-data.mdx): to quickly query and download any data
- [**Python notebooks**](./python-notebooks.md): to do more in-depth data science and processing
- [**Connect OSO to 3rd Party tools**](./3rd-party.mdx): like Hex.tech, Tableau, and Metabase
- [**Fork the data pipeline**](./fork-pipeline.md): to setup your own data pipeline off any OSO model
- [**API access**](./api.md): to integrate OSO metrics into a live production application
- [**oss-directory**](./oss-directory.md): to leverage [oss-directory](https://github.com/opensource-observer/oss-directory) data separate from OSO
190 changes: 149 additions & 41 deletions apps/docs/docs/integrate/overview/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,9 @@ page.

## OSO Data Exchange on Analytics Hub

To explore all the OSO datasets available on our BigQuery data exchange,
see [here](https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae).
To explore all the OSO datasets available,
check out our
[BigQuery data exchange](https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae).

## OSO Production Data Pipeline

Expand All @@ -36,17 +37,17 @@ see [here](https://console.cloud.google.com/bigquery/analytics-hub/exchanges/pro
children={"Subscribe on BigQuery"}
/>{" "}

- [Reference documentation](https://models.opensource.observer/#!/model/model.opensource_observer.code_metrics_by_project_v1)
- [License: CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)
- [Updated daily](https://dagster.opensource.observer/assets/dbt/production)

Every stage of the OSO data pipeline is queryable and downloadable.
Like most dbt-based pipelines, we split the pipeline stages into
[staging, intermediate, and mart models](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview).

You can find the reference documentation on every data model on
[https://models.opensource.observer/](https://models.opensource.observer/)

The data produced by the OSO data pipeline is released under the
[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)
license.

### OSO Mart Models

These are the final product from the data pipeline,
Expand Down Expand Up @@ -109,6 +110,10 @@ GROUP BY project_id, event_source
children={"Subscribe on BigQuery"}
/>{" "}

- [Reference documentation](https://models.opensource.observer/#!/model/model.opensource_observer.code_metrics_by_project_v1)
- [License: CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)
- [Updated daily](https://dagster.opensource.observer/assets/dbt/playground)

We maintain a subset of projects and events in a playground dataset for testing and development.
All of the production models are mirrored in this environment.

Expand All @@ -128,11 +133,15 @@ All of the production models are mirrored in this environment.
children={"View on BigQuery"}
/>{" "}

[Reference documentation](https://models.opensource.observer/#!/source_list/github_archive)
- [Reference documentation](https://models.opensource.observer/#!/source_list/github_archive)
- Code License: [MIT](https://github.com/igrigorik/gharchive.org/blob/master/LICENSE.md).
- Data governed by the GitHub
[terms of service](https://docs.github.com/en/site-policy/github-terms/github-terms-of-service).
- [Updated hourly](https://github.com/igrigorik/gharchive.org/blob/master/bigquery/README.md)

GitHub data is predominantly provided by the incredible
[GH Archive](https://www.gharchive.org/) project, which
maintains a BigQuery public dataset that is refreshed every hour.
[GH Archive](https://www.gharchive.org/) project, which publishes a public
archive of historical events to GitHub.

For example, to count the number of issues opened, closed, and reopened on 2020/01/01:

Expand All @@ -146,10 +155,6 @@ SELECT event as issue_status, COUNT(*) as cnt FROM (
GROUP by issue_status;
```

The underlying GitHub data is governed by the GitHub
[terms of service](https://docs.github.com/en/site-policy/github-terms/github-terms-of-service).
GH Archive code and documentation are covered by the
[MIT license](https://github.com/igrigorik/gharchive.org/blob/master/LICENSE.md).

### Ethereum Data

Expand All @@ -165,7 +170,8 @@ GH Archive code and documentation are covered by the
children={"View on BigQuery"}
/>{" "}

[Reference documentation](https://models.opensource.observer/#!/source_list/ethereum)
- [Reference documentation](https://models.opensource.observer/#!/source_list/ethereum)
- Code License: [MIT](https://github.com/blockchain-etl/ethereum-etl/blob/develop/LICENSE).

The Google Cloud team maintains a public
[Ethereum dataset](https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-public-dataset-smart-contract-analytics).
Expand All @@ -187,8 +193,6 @@ order by block_number desc
limit 10
```

ethereum-etl code is covered by the
[MIT license](https://github.com/blockchain-etl/ethereum-etl/blob/develop/LICENSE).

### Superchain Data

Expand All @@ -204,88 +208,192 @@ ethereum-etl code is covered by the
children={"Subscribe on BigQuery"}
/>{" "}

- Code License: [Apache-2.0](https://github.com/opensource-observer/oso/blob/main/LICENSE)
- Data governed by the OSO
[terms of service](https://www.opensource.observer/terms)
- [Updated daily](https://github.com/igrigorik/gharchive.org/blob/master/bigquery/README.md)

OSO is proud to provide public datasets for the Superchain,
backed by our partners at
[Goldsky](https://goldsky.com/).

We currently have coverage for:
We currently provide blocks, transactions, and traces for the following networks:

- [Optimism mainnet](https://www.optimism.io/)
([Reference docs](https://models.opensource.observer/#!/source_list/superchain)
, [Updated daily](https://dagster.opensource.observer/assets/optimism))
- [Base](https://www.base.org/)
([Reference docs](https://models.opensource.observer/#!/source_list/base)
, [Updated daily](https://dagster.opensource.observer/assets/base))
- [Frax](https://www.frax.com/)
([Reference docs](https://models.opensource.observer/#!/source_list/frax)
, [Updated daily](https://dagster.opensource.observer/assets/frax))
- [Metal](https://metall2.com/)
([Reference docs](https://models.opensource.observer/#!/source_list/metal)
, [Updated daily](https://dagster.opensource.observer/assets/metal))
- [Mode](https://www.mode.network/)
([Reference docs](https://models.opensource.observer/#!/source_list/mode)
, [Updated daily](https://dagster.opensource.observer/assets/mode))
- [PGN](https://publicgoods.network/)
([Reference docs](https://models.opensource.observer/#!/source_list/pgn)
, [Updated daily](https://dagster.opensource.observer/assets/pgn))
- [Zora](https://zora.co/)
([Reference docs](https://models.opensource.observer/#!/source_list/zora)
, [Updated daily](https://dagster.opensource.observer/assets/zora))


For example, to get deployed contracts from a particular address on the Base network:

- [Optimism mainnet](https://models.opensource.observer/#!/source_list/superchain)
- [Base](https://models.opensource.observer/#!/source_list/base)
- [Frax](https://models.opensource.observer/#!/source_list/frax)
- [Metal](https://models.opensource.observer/#!/source_list/metal)
- [Mode](https://models.opensource.observer/#!/source_list/mode)
- [PGN](https://models.opensource.observer/#!/source_list/pgn)
- [Zora](https://models.opensource.observer/#!/source_list/zora)
```sql
select
traces.block_timestamp,
traces.transaction_hash,
txs.from_address as originating_address,
txs.to_address as originating_contract,
traces.from_address as factory_address,
traces.to_address as contract_address
from `YOUR_PROJECT_NAME.superchain.base_traces` as traces
inner join transactions as txs
on txs.hash = traces.transaction_hash
where
LOWER(traces.from_address) != "0x3fab184622dc19b6109349b94811493bf2a45362"
and LOWER(trace_type) in ("create", "create2")
```

For terms of use, please see the OSO
[terms and conditions](https://www.opensource.observer/terms).
**Remember to replace 'YOUR_PROJECT_NAME' with the name of your project in the query.**

### Farcaster Data

<img src={FarcasterLogo} width="100" />

[Reference documentation](https://models.opensource.observer/#!/source_list/farcaster)
<Button
size={"compact"}
color={"blue"}
target={"_blank"}
link={
"https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae/listings/farcaster_19076cb8a53"
}
children={"Subscribe on BigQuery"}
/>{" "}

- [Reference documentation](https://models.opensource.observer/#!/source_list/farcaster)
- [Updated weekly](https://github.com/opensource-observer/oso/blob/main/warehouse/oso_dagster/assets/farcaster.py)

[Farcaster](https://www.farcaster.xyz/) is a decentralized social network built on Ethereum.
This dataset mirrors the dataset offered by [Indexing](https://blog.indexing.co/posts/IaPkkuevwwfgBWtZ3F7eg5oQUqyV_o6sLDo28oEV8Tg)
for use in the OSO data pipeline.
It includes casts, links, reactions, verifications, and profiles.

For example, to get the users with the most lifetime reactions:

```sql
SELECT
r.target_cast_fid as fid,
json_value(p.data, "$.display") as display_name,
COUNT(*) as reaction_count
FROM `YOUR_PROJECT_NAME.farcaster.reactions` as r
LEFT JOIN `YOUR_PROJECT_NAME.farcaster.profiles` as p ON r.target_cast_fid = p.fid
GROUP BY fid, display_name
ORDER BY reaction_count DESC
```

:::warning
Coming soon...
:::
**Remember to replace 'YOUR_PROJECT_NAME' with the name of your project in the query.**

### Lens Data

<img src={LensLogo} width="200" />

[Reference documentation](https://models.opensource.observer/#!/source_list/lens)
<Button
size={"compact"}
color={"blue"}
target={"_blank"}
link={
"https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae/listings/lens_19077b0822c"
}
children={"Subscribe on BigQuery"}
/>{" "}

:::warning
Coming soon...
:::
- [Reference documentation](https://models.opensource.observer/#!/source_list/lens)
- [Updated weekly](https://github.com/opensource-observer/oso/blob/main/warehouse/oso_dagster/assets/lens.py)

[Lens Protocol](https://www.lens.xyz/) is an open social network.
This dataset mirrors the dataset offered by [Lens](https://www.lens.xyz/docs/tools/bigquery)
for use in the OSO data pipeline.
It includes data from the Polygon network.

### Gitcoin Passport Data

<img src={GitcoinLogo} width="200" />

[Reference documentation](https://models.opensource.observer/#!/source_list/gitcoin)
<Button
size={"compact"}
color={"blue"}
target={"_blank"}
link={
"https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae/listings/gitcoin_passport_19077b6ad59"
}
children={"Subscribe on BigQuery"}
/>{" "}

- [Reference documentation](https://models.opensource.observer/#!/source_list/gitcoin)
- [Updated daily](https://dagster.opensource.observer/assets/gitcoin)

[Gitcoin Passport](https://passport.gitcoin.co/)
is a web3 identity verification protocol.
OSO and Gitcoin have collaborated to make this dataset
of address scores available for use in understanding user reputations.

For example, you can can vitalik.eth's passport score:
For example, you can can **vitalik.eth's** passport score:

```sql
select
passport_address,
last_score_timestamp,
evidence_rawScore,
evidence_threshold,
from opensource-observer.gitcoin.passport_scores
from YOUR_PROJECT_NAME.gitcoin.passport_scores
where passport_address = '0xd8da6bf26964af9d7eed9e03e53415d37aa96045'
```

**Remember to replace 'YOUR_PROJECT_NAME' with the name of your project in the query.**

### OpenRank Data

<img src={OpenrankLogo} width="200" />

[Reference documentation](https://models.opensource.observer/#!/source_list/karma3)
<Button
size={"compact"}
color={"blue"}
target={"_blank"}
link={
"https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae/listings/openrank_19077ba1f3f"
}
children={"Subscribe on BigQuery"}
/>{" "}

- [Reference documentation](https://models.opensource.observer/#!/source_list/karma3)
- [Updated daily](https://dagster.opensource.observer/assets/karma3)

[OpenRank](https://openrank.com/) is a decentralized reputation protocol based on
[Eigentrust](https://en.wikipedia.org/wiki/EigenTrust).
In this dataset, we scored Farcaster IDs.
In this dataset, Farcaster users reputations are scored in 2 ways:
- With `globaltrust`, we calculate global reputation scores, seeded by the trust of Optimism badgeholders.
- With `localtrust`, you can get reputation scores of other users relative to specified user.

For example, you can get the reputational score of vitalik.eth
For example, you can get the globaltrust reputation score of **vitalik.eth**

```sql
select
strategy_id,
i,
v,
date
from opensource-observer.karma3.globaltrust
from YOUR_PROJECT_NAME.karma3.globaltrust
where i = 5650
```

**Remember to replace 'YOUR_PROJECT_NAME' with the name of your project in the query.**

## Subscribe to a dataset

### 1. Data exchange listings
Expand Down

0 comments on commit e584a16

Please sign in to comment.