Skip to content

Commit

Permalink
Merge pull request #477 from elbwalker/bigquery-destination-docs
Browse files Browse the repository at this point in the history
Bigquery destination docs
  • Loading branch information
alexanderkirtzel authored Dec 13, 2024
2 parents a14c73c + 42a2b9e commit 04b7d22
Showing 1 changed file with 95 additions and 30 deletions.
125 changes: 95 additions & 30 deletions website/docs/destinations/bigquery.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,75 @@ Before using the BigQuery destination, ensure you have:
- Permissions to write to BigQuery
- Run the setup function to create the table

### Permissions

When using Service Accounts (SAs) for Google Cloud BigQuery, it's recommended to
follow the principle of _least privilege_. Never grant more permissions than
what it needs to perform its intended functions.

During initial setup, the SA may require broader permissions to create necessary
datasets. Typically, this involves assigning a role like
`roles/bigquery.dataOwner` to the service account. This role can be granted
through Google Cloud IAM.

For production environments, it is recommended to **revoke broader IAM roles**
granted during the setup phase. Assign explicit permissions directly to datasets
within BigQuery (using the share option). This ensures that the service account
only has access to what is necessary for operation. The service account may hold
owner permissions after creating the dataset.

:::tip

For more detailed information, refer to the official
[Google Cloud IAM documentation](https://cloud.google.com/iam/docs).

:::

### Create table

The destination requires an existing dataset and table to ingest data into.
Replace `PR0J3CT1D.walkerOS.events` with your actual project ID, dataset, and
table's name. Adjust the options if necessary, and run a query to create it.

```sql
CREATE TABLE `PR0J3CT1D.walkerOS.events` (
timestamp TIMESTAMP NOT NULL,
event STRING NOT NULL,
data JSON,
context JSON,
globals JSON,
custom JSON,
user JSON,
nested JSON,
consent JSON,
id STRING,
trigger STRING,
entity STRING,
action STRING,
timing NUMERIC,
`group` STRING,
count NUMERIC,
version JSON,
source JSON,
createdAt TIMESTAMP NOT NULL
)
PARTITION BY DATE(timestamp)
OPTIONS (
description="walkerOS raw events",
partition_expiration_days=365, -- Automatically delete data older than 1 year
require_partition_filter=true -- Enforce the use of partition filter in queries
);
```

:::tip

If you need to create a new dataset, consider to actively enable **physical
storage** billing model to eventually reduce your BigQuery costs. Based on your
events a compression factor of 6 is possible, but may result in higher querying
costs.

:::

## Installation

To get started, install the package via npm:
Expand Down Expand Up @@ -55,44 +124,40 @@ await elb('walker destination', destinationBigQuery, {

:::tip

Learn more how to
Learn how to
[authenticate with a service account key file](https://cloud.google.com/bigquery/docs/authentication/service-account-file)
using the custom `bigquery` options.

:::

```ts
{
custom: {
projectId: 'PR0J3CT1D', // Required
bigquery: {

}
}
}
```

## Permissions

When using Service Accounts (SAs) for Google Cloud BigQuery, it's recommended to
follow the principle of _least privilege_. Never grant more permissions than
what it needs to perform its intended functions.

During initial setup, the SA may require broader permissions to create necessary
datasets. Typically, this involves assigning a role like
`roles/bigquery.dataOwner` to the service account. This role can be granted
through Google Cloud IAM.

For production environments, it is recommended to **revoke broader IAM roles**
granted during the setup phase. Assign explicit permissions directly to datasets
within BigQuery (using the share option). This ensures that the service account
only has access to what is necessary for operation. The service account may hold
owner permissions after creating the dataset.
## Schema

| Field name | Type | Description | Example value |
| ----------- | --------- | ---------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
| `timestamp` | TIMESTAMP | Time when the event fired | `2024-09-09 13:19:48.462000 UTC` |
| `event` | STRING | Name of the event as a combination of entity and action | `promotion visible` |
| `data` | JSON | Arbitrary properties related to the entity | `{"name":"Set up tracking easily","interactive":false}` |
| `context` | JSON | Provides additional information about the state during the event | `{"stage":["learning",1],"test":["engagement",0]}` |
| `custom` | JSON | Additional space for individual setups | `null` |
| `globals` | JSON | General properties that apply to every event | `{"language":"en"}` |
| `user` | JSON | Contains user identifiers for different identification levels | `{"id":"us3r1d","device":"c00k131d","session":"s3ss10n1d"}` |
| `nested` | JSON | All nested entities within the main entity | `[{"type":"github",data:{"repo": "walkerOS"}}]` |
| `consent` | JSON | Status of the granted consent state(s) | `{"functional": true }` |
| `id` | STRING | Timestamp, group & count of the event | `1725887988462-u7jpt6-2` |
| `trigger` | STRING | Name of the trigger that fired | `load` |
| `entity` | STRING | Name of the entity | `promotion` |
| `action` | STRING | Name of the action | `visible` |
| `timing` | NUMERIC | Duration how long it took to trigger this event | `1.77` |
| `group` | STRING | Random identifier for all events during a run | `u7jpt6` |
| `count` | NUMERIC | Incremental counter of the events in the same run | `2` |
| `version` | JSON | Information about the used implementation setup | `{"source": "X.X.X", "tagging": 42}` |
| `source` | JSON | Details about the origin of the event | `{"type": "web","id": "https://github.com/elbwalker/walkerOS","previous_id": "https://www.elbwalker.com/"}` |
| `createdAt` | TIMESTAMP | Time when the event was received | `2024-09-09 13:19:48.880000 UTC` |

:::tip

For more detailed information, refer to the official
[Google Cloud IAM documentation](https://cloud.google.com/iam/docs).
Learn more about the&nbsp;<Link to="/docs/getting_started/event-model">event
model</Link>

:::

Expand Down

0 comments on commit 04b7d22

Please sign in to comment.