Releases: dlt-hub/dlt
Releases · dlt-hub/dlt
0.4.11
Core Library
- RESTClient: building blocks (auths, paginators, response extractors etc.) to write REST API pipelines by @burnash
- Enable
merge
write disposition forathena
Iceberg by @jorritsandbrink in #1315 - adds std pipe iterator for stdout and stderr by @rudolfix in #1321
- adds _impl_cls to dlt.resource and dynamic config section to standalone resources with dynamic names by @rudolfix in #1324
- Accept :memory: mode for credentials parameter in duckdb factory by @sultaniman in #1297
- allows windows native, UNC and extended paths in filesystem source and destination by @rudolfix in #1335
- improves union validation: user friendly exceptions by @rudolfix in #1327
- improves instantiation and shutdown of thread pools for telemetry trackers by @rudolfix in #1340
- feat(airflow): pass data sources as callables and additional initializers for delayed source evaluation by @IlyaFaer in #1318
- Fix: ignores table options on ALTER TABLE in BigQuery by @rudolfix in #1306
- Fix: use correct check for column prop in column schema by @z3z1ma in #1347
- Streamlit caching and session state store fixes by @sultaniman in #1326
- implements method to merge columns in two table schemas by @rudolfix in #1348
- Extend motherduck client configuration to pass custom user agent by @sultaniman in #1284
- allows fsspec until 2023.1.0 by @rudolfix in #1305
Docs
- REST Client documentation by @burnash https://dlthub.com/docs/general-usage/http/rest-client
- REST API verified source documentation by @burnash @willi-mueller @francescomucio https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api
- Docs/google ads by @dat-a-man in #1313
- Docs: Freshdesk documentation by @dat-a-man in #1228
- Add instruction on installing dlt via pixi and conda by @sultaniman in #1332
Verified Sources
- rest_api verified source: quickly declare REST API endpoints and convert it into regular dlt source by @burnash @willi-mueller @francescomucio
- rest_api launch blog by @adrianbr in #1355
Full Changelog: 0.4.10...0.4.11
0.4.10
Core Library
- Clickhouse destination by @Pipboyguy in #1097
- fix(filesystem): UNC paths are supported on filesystem source and destination by @IlyaFaer in #1209
scd2
extension: pick your active record literal, defaults to NULL by @jorritsandbrink in #1275- make missing keys warning conditional on merge strategy by @jorritsandbrink in #1290
- Fix filesystem layout timestamps with milliseconds by @sultaniman in #1286
- fallbacks to copy on any OSError when doing hardlink by @rudolfix in #1302
- configurable anonymous telemetry tracker by @rudolfix in #1301
- fix athena edge case and adds layout tests for athena by @sh-rp in #1289
- Streamlit app: do not show a notice if there is no resource state for schema by @sultaniman in #1300
Docs
- Docs: Google Ads documentation. by @dat-a-man in #1224
- explains how to pass explicit credentials + few mssql cases by @rudolfix in #1299
Full Changelog: 0.4.9...0.4.10
0.4.9
Core Library
- SCD2 support by @jorritsandbrink in #1168 https://dlthub.com/devel/general-usage/incremental-loading#scd2-strategy
- A fully configurable layout for filesystem files by @sultaniman in #1182 https://dlthub.com/devel/dlt-ecosystem/destinations/filesystem#files-layout
- picks file format matching item format to minimize number of rewrites during loading by @rudolfix in #1222
- fix athena iceberg's trailing location by @romanperesypkin in #1230
- Pass options to parse iso like strings by @VioletM in #1219
- pipeline state can be restored from filesystem destination by @sh-rp in #1184 - https://dlthub.com/devel/dlt-ecosystem/destinations/filesystem#syncing-of-dlt-state
- Remove
staging-optimized
replace strategy forsynapse
by @jorritsandbrink in #1231 - fixes bug, where configs where not injected for async functions by @sh-rp in #1241
- feat(transform): implement columns pivot map function by @IlyaFaer in #1152
- Add max_table_nesting to resource decorator by @sultaniman in #1242
- adds csv options to write headers, change delimiter, quotation style by @rudolfix in #1239
- Check for default schema and schema name in streamlit session by @sultaniman in #1155
- Add seconds and millisecond timestamps to filesystem date placeholders by @sultaniman in #1260
- send dlt telemetry wherever you want, not only segment by @zem360 in #1236
- Make merge write-disposition fall back to staging append if no primary or merge keys are specified by @sh-rp in #1225
- Add snowflake application parameter to configuration by @sultaniman in #1266
Docs
- Added docs for deploying dlt with Prefect. by @dat-a-man in #1138
- a note on scd2 incoming high ts change by @rudolfix in #1273
- adding images and wordsmithing to Prefect walkthrough by @WillRaphaelson in #1276
Verified Sources
- Use
pyarrow
,pandas
,connectorx
orsqlalchemy
backends when reading tables withsql_database
. See README for details. dlt-hub/verified-sources#425 - Google ads source is available dlt-hub/verified-sources#428
- Pages endpoint for notion dlt-hub/verified-sources#429
New Contributors
- @romanperesypkin made their first contribution in #1230
- @WillRaphaelson made their first contribution in #1276
Full Changelog: 0.4.8...0.4.9
0.4.9a2
A pre-release that allows to try out the following features and includes the following bugfixes:
- SCD2 support by @jorritsandbrink in #1168 We are still working on BigQuery support) https://dlthub.com/devel/general-usage/incremental-loading#scd2-strategy
- A fully configurable layout for filesystem files by @sultaniman in #1182 https://dlthub.com/devel/dlt-ecosystem/destinations/filesystem#files-layout
- picks file format matching item format by @rudolfix in #1222
- fix athena iceberg's trailing location by @romanperesypkin in #1230
- Pass options to parse iso like strings by @VioletM in #1219
- filesystem state sync by @sh-rp in #1184 - https://dlthub.com/devel/dlt-ecosystem/destinations/filesystem#syncing-of-dlt-state
- Remove
staging-optimized
replace strategy forsynapse
by @jorritsandbrink in #1231 - fixes bug, where configs where not injected for async functions by @sh-rp in #1241
- adds options to write csv headers, change delimiter by @rudolfix in #1239
Final release is scheduled for next week
0.4.8
Core Library
- Add Dremio as a destination by @maxfirman in #1026
- adds a fast loading of arrow tables/pandas to postgres via COPY csv by @rudolfix in #1185
- adds a csv writer for filesystem and postgres by @rudolfix in #1185
- saves parquet with all logical types,
spark
flavor is not a default any longer by @rudolfix in #1185
#1185 - feat(bigquery): add streaming inserts support by @IlyaFaer in #1123
- Feat: parameterize pipeline class in the primary factory method by @z3z1ma in #1176
- Fix: check for typeddict before class or subclass checks which fail by @z3z1ma in #1160
- fixes column order and add hints table variants by @rudolfix in #1127
- fixes schema versioning by @rudolfix in #1140
- regular initializers for credentials / config specs are type checked like dataclasses by @rudolfix in #1142
- fix streamlit app state display: Add yaml representer for pendulum datetime by @sultaniman in #1192
synapse
andmssql
bugfixes and improvements (INSERT VALUES UNION) by @jorritsandbrink in #1174- various improvements to arrow table normalization by @rudolfix in #1185
- arrow tables without rows create tables in destination by @rudolfix in #1185
- fixes Motherduck configuration to use
my_db
default database and makes password / token mandatory by @rudolfix in
Docs
- docs: add typechecking to embedded snippets by @sh-rp in #1130
- Fix typo with switched column names in schema evolution docs page by @b-per in #1132
- Docs: deploy with Kestra by @dat-a-man in #1087
- Docs: Deploy dlt on dagster by @dat-a-man in #1086
- Update example connection string by @MiConnell in #1188
- Changed directory of all the blog images to google cloud storage. by @dat-a-man in #1156
Verified Sources
- postgres replication / CDC by @jorritsandbrink dlt-hub/verified-sources#392
New Contributors
- @b-per made their first contribution in #1132
- @MiConnell made their first contribution in #1188
- @maxfirman made their first contribution in #1026
Full Changelog: 0.4.7...0.4.8
0.4.7
Core Library
- Custom destinations with
@dlt.destination
decorator by @sh-rp in #1065 - A BigQuery custom destination supporting STRUCT data types by @sh-rp in #1107
- Built-in Streamlit rewrite, UI improvements, dark theme a by @sultaniman in #1060
- fixes various edge cases with Incremental data deduplication, for ordered and unordered results #971 by @rudolfix in #1062
- Adds new
dlt.mark
marker to materialize table schemas without data by @rudolfix in #1122 - validates class instances in typed dict by @rudolfix in #1082
- feat(airflow): allow re-using sources in airflow wrapper by @IlyaFaer in #1080
- feat(core): drop default value for write disposition by @IlyaFaer in #1057
- splits pandas and arrow imports to fix pyarrow.compute missing by @rudolfix in #1112
- improve no schema upgrade path exception by @sh-rp in #1125
Docs
- docs(airflow): add description of new decompose methods by @IlyaFaer in #1072
- check embedded code blocks by @sh-rp in #1093
- docs(kafka): describe the possible sync issues by @IlyaFaer in #1100
- Docs: schema evolution by @dat-a-man in #1078
- Add example link to the custom destination page by @VioletM in #1120
Full Changelog: 0.4.6...0.4.7
0.4.6
Core Library
- feat(airflow): expose the Airflow runner method to create custom DAGs by @IlyaFaer in #1014
- removes sql alchemy dependency and port parts of URL class by @rudolfix in #1028
- Parallelize decorator - run many regular generators in parallel by @steinitzu in #965
- Add main entry point to support calling dlt as python module by @sultaniman in #1023
Library Bugfixes
- fixes naive datetime bug in incremental by @rudolfix in #1020
- Import missing pyarrow compute for transforms on arrowitems by @sh-rp in #1010
- delete normalized package in case it already existed by @sh-rp in #1012
- fix(core): validation error with TTableHintTemplate by @IlyaFaer in #1039
- adds test case where payload data contains PUA unicode characters by @willi-mueller in #1053
- fix add_limit behavior in edge cases by @sh-rp in #1052
- adds row_order to Incremental - automatically stop taking data when out of range by @rudolfix in #1041
- Fix to serialize load metrics as list instead of a dictionary by @sultaniman in #1051
- fix import schema workflow by @sh-rp in #1013
- rollback all changes to live schemas when extraction fails by @sh-rp in #1013
Docs
- Fix zendesk example test by @VioletM in #1027
- Edit arrow-pandas.md and fix a typo by @Bl3f in #1001
- Added info about file compression to filesystem docs by @dat-a-man in #975
- Update "create destination" docs with new file layouts by @steinitzu in #1032
- Docs update on how to set query limits. by @dat-a-man in #973
- Docs/Updated for slack alerts. by @dat-a-man in #1042
Verified Sources
- scrape web sites with spiders and Scrapy and send data to dlt @sultaniman dlt-hub/verified-sources#332
sql_database
recoginizesend_value
androw_order
to return rows in range and optionally ordered. backfill and proper Airflow intervals support @rudolfix dlt-hub/verified-sources#388
New Contributors
Full Changelog: 0.4.5...0.4.6
0.4.5
Core Library
- enables google drive filesystem for sources and destinations (second one experimental, google drive listings are only eventually consistent!) by @IlyaFaer in #932
- creates parallel Airflow DAGs in airflow helper to allow many resources to be executed at once @IlyaFaer in #966
- 855 create bigquery adapter for dlt resources: easily configure partitions, clustering, data retention etc. by @Pipboyguy in #952 and https://dlthub.com/docs/dlt-ecosystem/destinations/bigquery#bigquery-adapter
- Use BIGNUMERIC for large decimals in bigquery by @steinitzu in #984
- Normalize keys for Google secrets config provider by @sultaniman in #963
- does not lowercase postgres and redshift database names by @rudolfix in #990
- Introduce
hard_delete
anddedup_sort
columns hint formerge
by @jorritsandbrink in #960 and https://dlthub.com/docs/general-usage/incremental-loading#delete-records - adjustment of pua start in typed json encoding, pass through on decoding errors by @rudolfix in #974
- creates isolated parallel Airflow DAGs in airflow helper to execute resources parallel in isolated pipelines @IlyaFaer in #979
- Fix annotation processing and rebuilding, mark dataclass as complex by @sultaniman in #980
- allows async functions to be decorated with dlt.source by @rudolfix in #985
- allows right pipe operator to feed simple lists into a transformer @rudolfix in #985
- allows pendulum datetime as incremental cursor when loading arrow tables @rudolfix in #985
- enables Python 3.12 (mind that not all extras have python 3.12 libraries!) @rudolfix in #985
Docs
- docs(filesystem): include Google Drive into filesystem tutorial by @IlyaFaer in #962
- Fix typos/grammar in tutorial docs by @taljaards in #972
- add blog post observability by @adrianbr in #989
- Update arrow-pandas.md by @snehangsude in #992
- Clarify info about GoodData in modelling tools article by @mhauzirek in #956
- Fix small typings in contributing guide by @VioletM in #993
- Docs/google sheets update by @dat-a-man in #976
- Added "Incremental Configuration" section to SQL Databases documentat… by @dat-a-man in #977
Verified Sources
- Bing Webmaster source by @willi-mueller
New Contributors
- @taljaards made their first contribution in #972
- @mhauzirek made their first contribution in #956
- @snehangsude made their first contribution in #992
- @VioletM made their first contribution in #993
Full Changelog: 0.4.4...0.4.5
0.4.4
Core Library
- passes incremental from apply hints to resource function by @rudolfix in #953
- Handle UnionType when checking is_union_type and is_optional_type by @sultaniman in #951
- yanks orjson to <=0.3.10 by @rudolfix in #958
Docs
- Databricks workspace setup docs by @steinitzu in #949
Verified Source
- allows for table reflection at runtime, column selection and buffer control in
sql_database
@rudolfix (dlt-hub/verified-sources#351)
Full Changelog: 0.4.3...0.4.4
0.4.3
Core Library
- Databricks destination by @steinitzu and @phillem15 in #892
- Synapse destination by @jorritsandbrink in #900
- BigQuery Partitioning Improvements by @Pipboyguy in #887
- enable async generators as resources by @sh-rp in #905
- fix: use truthy value in ternary since 0 cause div by zero by @z3z1ma in #902
- feat(filesystem): add compression flag if the read file is GZ by @IlyaFaer in #912
- Enhancements in Filesystem Configuration by @Pipboyguy in #869
- add mark function to emit resource hints from decorated function by @rudolfix in #938
- handles nested Pydantic models when generating dlt schema by @sultaniman in #901
Docs
- Restructure intro, getting started and tutorial by @burnash in #702
- Update the release instructions in CONTRIBUTING.md by @burnash in #867
- Add explicit sub section about streamlit under getting started by @sultaniman in #884
- Examples: google sheets by @AstrakhantsevaAA in #846
- Added URL-parser documentation by @dat-a-man in #909
Verified Sources
- feat(filesystem): implement a csv reader with duckdb engine @IlyaFaer dlt-hub/verified-sources#319
- fix(notion): define payload within the while-loop @glebzhidkov (dlt-hub/verified-sources#338)
- sql alchemy + connector x example @rudolfix (dlt-hub/verified-sources#334)
- Shopify: Standalone resource for partner API queries @steinitzu (dlt-hub/verified-sources#329)
- sql-database: detect precision and scale of supported column types @steinitzu (dlt-hub/verified-sources#324)
- feat(sources.kafka): implement Kafka source @IlyaFaer (dlt-hub/verified-sources#306)
New Contributors
- @Pipboyguy made their first contribution in #869
- @sultaniman made their first contribution in #883
Full Changelog: 0.4.2...0.4.3