Releases: dlt-hub/dlt
0.3.8
Core Library
-
use Airflow (and possibly other) schedulers with dlt resources by @rudolfix in #534
A really cool feature that allows your incremental loading to take date ranges from Airflow schedulers. Do backfilling, incremental loading and relay on Airflow to keep the pipeline state. -
Ignore hints prefixed with 'x-' in table_schema() by @burnash in #525
-
Now our CI works correctly from forks! by @steinitzu in #530
Support for unstructured data!
A really cool data source that let's you ask questions about your PDF documents and stores the answers in any of our destinations. Going from binary blobs through unstrucutred.io, vector databases and LLM queries to ie. duckdb and bigquery. Blobs coming from filesystem, google drive or your inbox (also incrementally) by @AstrakhantsevaAA
0.3.6
Core Library
-
fixes lost data and incorrect handling of child tables during
truncate-and-insert
replace by @sh-rp in #499
This is important improvement that fixes a few holes intruncate-and-insert
replace mode (which was there from beginning ofdlt
). Now we truncate all the tables before multithreaded append process starts. We also truncate child tables that could be left with data before.
details: #263 #271 -
fixes deploy airflow secrets and makes
toml
the default layout by @rudolfix in #513 -
check the required verified source
dlt
version duringdlt init
and warn users by @steinitzu in #514 -
add schema version to _dlt_loads table by @codingcyclist in #466
Docs
- Add example values to data types docs by @burnash in #516
- adding destination walkthrough by @rudolfix in #520
New Contributors
- @codingcyclist made their first contribution in #466
Full Changelog: 0.3.5...0.3.6
0.3.5
Core Library
-
Fix incremental hitting end_value throwing out whole batches by @steinitzu in #495
-
replace with staging tables by @sh-rp in #488
Now staging dataset may be used to replace tables. you can chose from several replace strategies (https://dlthub.com/docs/general-usage/full-loading) including fully transactional and atomic replacing of parent and all child tables or optimized where we use ie. ability to clone tables and copy on write in BigQuery and Snowflake -
detect serverless aws_lambda by @muppinesh in #490
Docs
- staging docs update by @rudolfix in #496
- Updates to verified sources by @dat-a-man
New Contributors
- @muppinesh made their first contribution in #490
Full Changelog: 0.3.4...0.3.5
0.3.4
Core Library
- staging for loader files implemented by @sh-rp in #451
- staging for redshift on s3 bucket and json + parquet by @sh-rp in #451
- staging for bigquery on gs bucket and json + parquet by @sh-rp in #451
- staging for snowflake on s3+gs buckets and json + parquet by @sh-rp in #451
- improvements and bugfixes for parquet generation by @rudolfix in #451
- tracks helpers usage and source names by @rudolfix in #497
- Fix: use sets to prevent unnecessary truncate calls by @z3z1ma in #481
Docs
- staging docs update by @sh-rp in #485
- rewritten documentation for destinations @rudolfix @AstrakhantsevaAA @dat-a-man
- adds category pages for sources and destinations by @rudolfix in #486
- Clarifies create-a-pipeline docs by @willi-mueller in #493
New Contributors
- @willi-mueller made their first contribution in #493
Full Changelog: 0.3.3...0.3.4
0.3.3
Core Library
- supports motherduck as a destination by @rudolfix in #460
- dbt 1.5 compatibility, enabled motherduck dbt support by @sh-rp in #475
- add more retry conditions and makes timeouts configurable in dlt requests drop-in replacement by @steinitzu in #477
- end_value support to incremental: backloading in parallel chunks now possible by @steinitzu in #467
Docs
- deploy cloud function as webhook by @dat-a-man in #449
- several key sections were updated and refactored by @AstrakhantsevaAA
- destination documentation refactor by @rudolfix in #478
Full Changelog: 0.3.2...0.3.3
0.3.3a0
0.3.2
Core Library
- snowflake destination: we support loading via PUT stage (
parquet
andjsonl
) and password and key pair authentication by @steinitzu in #414 - parquet files in load packages are supported with pyarrow. following destinations accept those when loading: bigquery, duckdb, snowflake and filesystem, by @sh-rp in #403
dbt-snowflake
supported by dbt wrapper by @steinitzu in #448
Docs
- Docs: polished reference's docs by @AstrakhantsevaAA in #430
dhelp
(AI assistant in docs) enabled by @burnash in #390- Added deploy with google cloud functions by @dat-a-man in #426
- train-gpt-q&a-blog by @TongHere in #438
- adding the open api spec article by @rahuljo in #442
- Docs/user guide data scientists by @AstrakhantsevaAA in #436
- Docs: airflow intro by @AstrakhantsevaAA in #444
- documents snowflake destination by @rudolfix in #447
- add file formats and fill out the parquet page in docs by @sh-rp in #439
- Added filesystem destination docs by @dat-a-man in #440
Full Changelog: 0.3.1...0.3.2
0.3.1
What's Changed
- add computed exhausted property by @sh-rp in #380
- removes the unpickable lambdas from destination caps and updates tests by @rudolfix in #404
- add secrets format option to dlt deploy by @sh-rp in #401
- Feat: Use compression to maximize network and disk space efficiency by @z3z1ma in #415
- 379 round robin pipe iterator by @sh-rp in #421
Docs
- adding article by @TongHere in #411
- GPT Training fix link by @TongHere in #417
- Docs: deploy airflow by @AstrakhantsevaAA in #410
- restructured docs: new Getting Started and dlt Ecosystem @rahuljo in #398 @adrianbr in #408
- Added Jira Docs by @dat-a-man in #425
- add structured data lake, fix titles by @adrianbr in #419
- adds duckdb->bigquery walkthrough by @rudolfix in #392
- Added sql_database pipeline by @dat-a-man in #396
- Added stripe setup guide by @dat-a-man in #394
- Added Workable pipeline docs by @dat-a-man in #395
- Added salesforce docs by @dat-a-man in #413
- Added Notion Docs by @dat-a-man in #409
- Added Mux docs by @dat-a-man in #412
New Contributors
Full Changelog: 0.3.0...0.3.1
0.3.0
Core Library
- renames Pipelines to Verified Sources by @rudolfix in #382
- adds tests to build containers, removes psutil by @rudolfix in #373
- finalizes where the resource state is stored in pipeline state by @rudolfix in #374
- accepts explicit values for unions if type of value is one of types by @rudolfix in #377
- add quotes to missing dependency exception output by @sh-rp in #387
- Feat/Add transaction management for filesystem operations using fsspec by @z3z1ma in #384
Minor Version Changes
- source name is now the key in pipeline state that stores all the source and resource state. previously the source section (which was the name of python module where source was defined) was used. this change will affect the already deployed pipelines that had name of the source different from the name of the module. they will not see the already stored state and may, for example, load some data twice. the only verified source affected by this is zendesk.
Docs
- rewrites the sections on source, resource and pipeline state by @rudolfix in #376
- minor changes to schema evolution doc by @rahuljo in #372
- pushing experiment 4 blog by @rahuljo in #371
- update docusaurus and fix gtag by @sh-rp in #385
- add section landing pages to docusaurus by @sh-rp in #386
New Contributors
Full Changelog: 0.2.9...0.3.0
0.2.9
Core Library
- dlt source decomposition into Airflow DAG by @rudolfix in #352
- airflow dlt wrapper to run dlt pipelines as DAGs by @rudolfix in #357
- dlt deploy airflow-composer by @AstrakhantsevaAA in #356
- new destination: filesystem/bucket with fsspec by @steinitzu in #342
- Update deprecated GitHub action by @tungbq in #345
- A base class for vault config providers with two implementations Google Secrets config provider and Airflow config provider
Docs
- pushing experiment 3 blog post by @rahuljo in #361
- structured data lakes post by @adrianbr in #362
- Several fixes and improvements by @tungbq
New Contributors
- @AstrakhantsevaAA made their first contribution in #356
Full Changelog: 0.2.8...0.2.9