Releases: dlt-hub/dlt
0.3.16
Core Library
-
add default user agent header to
dlt
requests client by @sh-rp in #595 -
Add pydantic support by @steinitzu in #589
You can use pydantic to define table schemas. You can load pydantic instances like you can load dictionaries -
NormalizerInfo
: item counts in table present in trace by @sh-rp in #582
Get counts of items added to table from normalization stage -
Add azure blob storage filesystem/staging destination by @steinitzu in #592
Also includes Snowflake stage support -
general state sync interface by @sh-rp in #564
You can restore state and schemas from Weaviate now (filesystem comes later) -
uses botocore instead of boto3 in AwsCredentials by @rudolfix in #590
Docs
- Update pseudonymizing_columns.md by @wtfzambo in #598
- Docs: getting started by @AstrakhantsevaAA in #568
We have a really nice getting started now: https://dlthub.com/docs/getting-started
New Verified Sources
- really nice
airtable
source by @willi-mueller in dlt-hub/verified-sources#218
thx for amazing contribution!
New Contributors
Full Changelog: 0.3.13...0.3.16
0.3.13
Core Library
- Feat: don't require AWS credentials for external Snowflake stage by @codingcyclist in #587
- connecting to local Weaviate made easy by @rudolfix in #591
- allows setting table name via property on DltResource by @rudolfix in #593
- destination tests refactored by @sh-rp in #572
Docs
- docs snippet and examples will be now linted and tested by @sh-rp in #559
- several blog posts and verified sourced docs updates by @adrianbr and @dat-a-man
New Verified Sources
- MongoDb source working in the same way as sql database by @sehnem in dlt-hub/verified-sources#239
New Contributors
Full Changelog: 0.3.12...0.3.13
0.3.12
Core Library
In this version we release two new types of a destinations:
- Add a Weaviate destination by @burnash in #479
A vector data store: load and query vectorized text data - Basic AWS Athena support by @sh-rp in #522
A data lake destination which works together withfilesystem
as a staging
Apart from that bug fixes:
- fixes airflow provider init sequence by @rudolfix in #569
- fixes transformer decorator typings by @rudolfix in #554
Docs
- We improved documentation for many verified sources (thx @dat-a-man and @AstrakhantsevaAA )
- updates contribution and readme + small docs fixes by @rudolfix in #553
- Edit weaviate docs by @hsm207 in #566
New Contributors
Full Changelog: 0.3.10...0.3.12
0.3.10
Core Library
- Fix config dataclasses on python 3.11 by @steinitzu in #541
Now P3.11 is fully tested on CI - removes optional dependencies by @rudolfix in #552
sentry-sdk
and several dependencies used bydlt deploy
command were moved to extras. several others (includingfsspec
) have their minimal version set to earlier versions - PR above is also fixing #539 and #540
Full Changelog: 0.3.9...0.3.10
0.3.9
Bugfix Release
When a replace with staging dataset was used in version 0.3.8, tables with other write dispositions were also truncated (in other words all the tables in the schema could be truncated). Note that default replace strategy does not use staging dataset so if you didn't explicitly changed you were not affected.
This release fixes that bug. If you use the replace strategy above, update the library.
Full Changelog: 0.3.8...0.3.9
0.3.8
Core Library
-
use Airflow (and possibly other) schedulers with dlt resources by @rudolfix in #534
A really cool feature that allows your incremental loading to take date ranges from Airflow schedulers. Do backfilling, incremental loading and relay on Airflow to keep the pipeline state. -
Ignore hints prefixed with 'x-' in table_schema() by @burnash in #525
-
Now our CI works correctly from forks! by @steinitzu in #530
Support for unstructured data!
A really cool data source that let's you ask questions about your PDF documents and stores the answers in any of our destinations. Going from binary blobs through unstrucutred.io, vector databases and LLM queries to ie. duckdb and bigquery. Blobs coming from filesystem, google drive or your inbox (also incrementally) by @AstrakhantsevaAA
0.3.6
Core Library
-
fixes lost data and incorrect handling of child tables during
truncate-and-insert
replace by @sh-rp in #499
This is important improvement that fixes a few holes intruncate-and-insert
replace mode (which was there from beginning ofdlt
). Now we truncate all the tables before multithreaded append process starts. We also truncate child tables that could be left with data before.
details: #263 #271 -
fixes deploy airflow secrets and makes
toml
the default layout by @rudolfix in #513 -
check the required verified source
dlt
version duringdlt init
and warn users by @steinitzu in #514 -
add schema version to _dlt_loads table by @codingcyclist in #466
Docs
- Add example values to data types docs by @burnash in #516
- adding destination walkthrough by @rudolfix in #520
New Contributors
- @codingcyclist made their first contribution in #466
Full Changelog: 0.3.5...0.3.6
0.3.5
Core Library
-
Fix incremental hitting end_value throwing out whole batches by @steinitzu in #495
-
replace with staging tables by @sh-rp in #488
Now staging dataset may be used to replace tables. you can chose from several replace strategies (https://dlthub.com/docs/general-usage/full-loading) including fully transactional and atomic replacing of parent and all child tables or optimized where we use ie. ability to clone tables and copy on write in BigQuery and Snowflake -
detect serverless aws_lambda by @muppinesh in #490
Docs
- staging docs update by @rudolfix in #496
- Updates to verified sources by @dat-a-man
New Contributors
- @muppinesh made their first contribution in #490
Full Changelog: 0.3.4...0.3.5
0.3.4
Core Library
- staging for loader files implemented by @sh-rp in #451
- staging for redshift on s3 bucket and json + parquet by @sh-rp in #451
- staging for bigquery on gs bucket and json + parquet by @sh-rp in #451
- staging for snowflake on s3+gs buckets and json + parquet by @sh-rp in #451
- improvements and bugfixes for parquet generation by @rudolfix in #451
- tracks helpers usage and source names by @rudolfix in #497
- Fix: use sets to prevent unnecessary truncate calls by @z3z1ma in #481
Docs
- staging docs update by @sh-rp in #485
- rewritten documentation for destinations @rudolfix @AstrakhantsevaAA @dat-a-man
- adds category pages for sources and destinations by @rudolfix in #486
- Clarifies create-a-pipeline docs by @willi-mueller in #493
New Contributors
- @willi-mueller made their first contribution in #493
Full Changelog: 0.3.3...0.3.4
0.3.3
Core Library
- supports motherduck as a destination by @rudolfix in #460
- dbt 1.5 compatibility, enabled motherduck dbt support by @sh-rp in #475
- add more retry conditions and makes timeouts configurable in dlt requests drop-in replacement by @steinitzu in #477
- end_value support to incremental: backloading in parallel chunks now possible by @steinitzu in #467
Docs
- deploy cloud function as webhook by @dat-a-man in #449
- several key sections were updated and refactored by @AstrakhantsevaAA
- destination documentation refactor by @rudolfix in #478
Full Changelog: 0.3.2...0.3.3