Skip to content

Commit

Permalink
adds arrow docs to index
Browse files Browse the repository at this point in the history
  • Loading branch information
rudolfix committed Oct 17, 2023
1 parent 8be4891 commit b4365dd
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 15 deletions.
25 changes: 11 additions & 14 deletions docs/website/docs/dlt-ecosystem/verified-sources/arrow-pandas.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ keywords: [arrow, pandas, parquet, source]
You can load data directly from an Arrow table or Pandas dataframe.
This is supported by all destinations that support the parquet file format (e.g. [Snowflake](../destinations/snowflake.md) and [Filesystem](../destinations/filesystem.md)).

This is a more performance way to load structured data since dlt bypasses many processing steps normally involved in passing JSON objects through the pipeline.
Dlt automatically translates the Arrow table's schema to the destination table's schema and writes the table to a parquet file which gets uploaded to the destination without any further processing.
This is a more performant way to load structured data since `dlt`` bypasses many processing steps normally involved in passing JSON objects through the pipeline.
`dlt` automatically translates the Arrow table's schema to the destination table's schema and writes the table to a parquet file which gets uploaded to the destination without any further processing.

## Usage

To write an Arrow source, pass any `pyarrow.Table` or `pandas.DataFrame` object to the pipeline's `run` or `extract` method, or yield table(s)/dataframe(s) from a `@dlt.resource` decorated function.
To write an Arrow source, pass any `pyarrow.Table`, `pyarrow.RecordBatch` or `pandas.DataFrame` object (or list of thereof) to the pipeline's `run` or `extract` method, or yield table(s)/dataframe(s) from a `@dlt.resource` decorated function.

This example loads a Pandas dataframe to a Snowflake table:

Expand Down Expand Up @@ -50,6 +50,14 @@ pipeline.run([table], table_name="orders")

Note: The data in the table must be compatible with the destination database as no data conversion is performed. Refer to the documentation of the destination for information about supported data types.

## Destinations that support parquet for direct loading
* duckdb & motherduck
* redshift
* bigquery
* snowflake
* filesystem
* athena

## Incremental loading with Arrow tables

You can use incremental loading with Arrow tables as well.
Expand Down Expand Up @@ -94,14 +102,3 @@ The Arrow data types are translated to dlt data types as follows:
| `decimal` | `decimal` | Precision and scale are determined by the type properties. |
| `struct` | `complex` | |
| | | |











2 changes: 1 addition & 1 deletion docs/website/docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -444,7 +444,7 @@ Each event type is sent to a different table in `duckdb`.
import dlt
from dlt.sources.helpers import requests

@dlt.resource(primary_key="id", table_name=lambda i: i["type"], write_disposition="append") # type: ignore
@dlt.resource(primary_key="id", table_name=lambda i: i["type"], write_disposition="append")
def repo_events(
last_created_at = dlt.sources.incremental("created_at")
):
Expand Down
1 change: 1 addition & 0 deletions docs/website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ const sidebars = {
},
items: [
'dlt-ecosystem/verified-sources/airtable',
'dlt-ecosystem/verified-sources/arrow-pandas',
'dlt-ecosystem/verified-sources/asana',
'dlt-ecosystem/verified-sources/chess',
'dlt-ecosystem/verified-sources/facebook_ads',
Expand Down

0 comments on commit b4365dd

Please sign in to comment.