Skip to content

Releases: dlt-hub/dlt

1.15.0

05 Aug 18:06
273420b
Compare
Choose a tag to compare

Breaking changes

This version will add .gz extensions to files that are compressed. That includes filesystem destinations, internal working directory and staging locations used to feed other destinations. A few practical hints:

  • Existing filesystem destination will continue storing files without gz extension and they are not affected by the change (existing datasets will retain their behavior where this extension is not added for backwards compatibility)
  • Compressed files uploaded to staging destinations will now have the .gz extension, also if dlt is configured to keep data in stage
  • This does not apply to parquet files.
  • More information can be found in the filesystem destination docs: https://dlthub.com/docs/dlt-ecosystem/destinations/filesystem#file-compression

Core Library

  • [Databricks destination] Adding comment and tags for table and columns and applying primary and foreign key constraints in Unity Catalog by @bayees in #2674
  • feat - add crlf support for csv exports by @7amza79 in #2783
  • feat: add has_more boolean flag logic to RESTClient OffsetPaginator by @michaelconan in #2817
  • rest_api: fix: make ProcessingSteps filter and map fields optional by @burnash in #2913
  • Enable and test python 3.14 support by @sh-rp in #2789
  • removes init files from dlt tables in filesystem by @rudolfix in #2868
  • restclient: json param range paginator by @Giackgamba in #2917
  • fix sync destination warning logging call by @sh-rp in #2927
  • fix: missing __repr__ for @dlt.transformation by @zilto in #2940
  • fix: restclient: handle null data in response by @burnash in #2936
  • Fix: saving compressed load files with .gz extension by @anuunchin in #2835
  • fix: prevent DuplicateSchema error when using public schema in Redshift by @franloza in #2953
  • feat: Schema.to_dbml(), auto export schemas in dbml format by @zilto in #2929
  • QoL: improve DataValidationError output: use identifying columns if present by @djudjuu in #2915
  • callback collector by @djudjuu in #2922
  • skips inferring incomplete column when already incomplete by @rudolfix in #2935
  • 2946 sqlalchemy destination fixes (full support for mssq, partial for trino) by @rudolfix in #2951
  • adds precision to _dlt_load_id and _dlt_id columns by @rudolfix in #2951
  • adds json field support for mssql by @rudolfix in #2951
  • fixes clickhouse temporary table engine not propagate to nodes (failed merges fix) by @rudolfix in #2951
  • fixes BIGQUERY numeric creation (when scale was set to 0) by @rudolfix in #2951
  • fix: replace arrow2 with arrow backend for connectorx, enables newest connectorx versions by @zilto in #2933
  • AI Command: extended with IDEs (rules for all major IDEs are supported) by @anuunchin in #2937
  • duckdb bumped to 1.3.2, iceberg scanners updated by @rudolfix in #2958
  • Feat: Allow control over streamed_exec in delta merge upsert by @anuunchin in #2961
  • fix failing top level module imports on projects in dirs that start with a dot by @sh-rp in #2963

Docs

New Contributors

Full Changelog: 1.14.1...1.15.0

1.14.1

16 Jul 20:45
01d3242
Compare
Choose a tag to compare

Breaking Changes
If you used pipeline.dataset() and used ibis syntax to write queries please read below:

Core Library

  • fix filesystem config section by @sh-rp in #2865
  • fix: typing for updated datasets and relations Protocols by @zilto in #2870
  • Add workspace extra and rename marimo app to "pipeline dashboard" by @sh-rp in #2876
  • rest_api: Redact secrets in logs, add configurable response body in errors by @burnash in #2867
  • fixes range_start=open in incremental by @rudolfix in #2873
  • feat: autocompletion added for dataset and relation when in Notebook by @zilto in #2891
  • Fix logger.isEnabledFor() TypeError by @burnash in #2882
  • fixes arrow/pandas dependencies in extras and dep groups by @rudolfix in #2895

Docs

  • simplify playground setup cell by @sh-rp in #2857
  • do not run lancedb custom destination example test on forked subprocess by @djudjuu in #2854
  • Added troubleshooting steps for Databricks and other minor updates by @dat-a-man in #2871
  • add ibis dataset migration guide by @sh-rp in #2874
  • docs: adds documentation for column subset selection in sql_database source by @franloza in #2869

New Contributors

Full Changelog: 1.12.3...1.14.1

1.13.0

08 Jul 19:44
4ed21fd
Compare
Choose a tag to compare

Core library features

  • Extend CSV quoting options in CsvWriter by @burnash in #2810
  • rest_api: add HeaderCursorPaginator to configuration by @burnash in #2798
  • rest_api: Raise ValueError for incorrect auth config types by @burnash in #2799
  • feat(athena): apply lakeformation tags on database (cont.) by @rudolfix in #2808
  • Add sock argument for SFTPCredentials by @AyushPatel101 in #2803
  • Psycopg2SqlClient: accept extra options by @nicob3y in #2755
  • Update fruitshop source with slightly more data and setup that enables star schema demonstration by @sh-rp in #2845
  • return latest step info by @djudjuu in #2829
  • change secrets.toml file to sources.rest_api_pipeline.github by @kaliole in #2849
  • Chore: Pyiceberg's python contsraint moved from project wide constraints by @anuunchin in #2839

Internals

Cli

Docs

New Contributors

Full Changelog: 1.12.3...1.13.0

1.12.3

25 Jun 19:17
bc82cd0
Compare
Choose a tag to compare

Core Library

  • (feat) allows to add SQL statements to schema migration executed after tables were created/altered by @rudolfix in #2791
  • Detect whether query just filters rows or is more complex with sqlglot by @anuunchin in #2619
  • (QoL):adds str and repr to dataset and relation by @rudolfix in #2796
  • fix: added @dlt.transformation to __all__ by @zilto in #2797
  • Fix: Null column type not inferred info/warning floods the output by @anuunchin in #2800
  • rest_api: allow processing multiple DltResource instances by @burnash in #2807
  • marimo app updates by @sh-rp in #2778
  • Hotix - fix marimo start command by @sh-rp in #2812

CI

  • enable linting on python 3.13 by @sh-rp in #2790
  • run all common tests with --resolution lowest-direct on uv sync by @sh-rp in #2787

1.12.2a0

24 Jun 08:24
6138ef0
Compare
Choose a tag to compare
1.12.2a0 Pre-release
Pre-release

This is a prelease of dlt and our first build with the uv package manager.

1.12.1

18 Jun 20:01
91420d7
Compare
Choose a tag to compare

Core Library

Quality of Life (fixing annoying little things)

  • 2529-INFO_TABLES_QUERY_THRESHOLD-as-paramterer-from-config by @amirdataops in #2600
  • warn when resolving configs or secrets with placeholder values by @djudjuu in #2636
  • Prevent unecessary preliminary connection in dataset by @sh-rp in #2645
  • QoL: warning with hint to provide data types for columns with exclusively None values by @anuunchin in #2633
  • Fix issue 2690: switch to packaging to remove warning on import dlt by @djudjuu in #2707
  • qol: exception formatting by @zilto in #2715
  • Regular and standalone resources are now the same thing. Both provide nice typed callables, allow to be renamed and allow to inject secrets and configs in the same way - also when part of an inner function. This unifies injection behavior for all our decorators.
    In the example below (1) access_token secrets is allowed in inner resource (2) limit argument with default will be injected from ie. LIMIT env variable which was skipped before
@dlt.source
def source():
    @dlt.resource(write_disposition="merge", primary_key="_id")
    def documents(access_token=dlt.secrets.value, limit=10):
        yield from generate_json_like_data(access_token, limit)

    return documents

⚠️ Still we do not recommend to define parametrized inner resources.

  • You can now return data from resources instead of yielding single item. We do not recommend that for code readability.dlt always wraps resources in generators so your return will be converted to yield.
  • To return a DltResource from a resource function you must explicitly type the return value:
@dlt.resource
def rv_resource(name: str) -> DltResource:
    return dlt.resource([1, 2, 3], name=name, primary_key="value")
  • normalizes config resolve behavior: default values can be overridden from providers but explicit cannot.
  • ⚠️ previously, if those were instances of base configurations, behavior was inconsistent (explicit values were treated like defaults).
  • ⚠️ if native value is found for a config and it does not accept native values, config resolution will fail, previously it was ignored
  • We use custom, consistent wrap and unwrap of functions. our decorators preserve both typing and runtime signature of decorated functions. makefun got removed.
  • if Incremental initializes from another Incremental as native value, it copies original type correctly
  • dlt.resource can define configuration section (also using lambdas)

Bugfixes and improvements

  • feat: Expand sql table resource config by @xneg in #2396
  • Added write_disposition to sql table config
  • Added primary_key and merge_key to sql table config
  • Feat: Support clustered tables with custom column order in BigQuery destination by @hsm207 in #2638
  • Feat: Add configuration propagation to deltalake.write_deltalake (#2629) by @gaatjeniksaan in #2640
  • Add support for creating custom integer-range partition table in BigQuery by @hsm207 in #2676
  • Upsert merge strategy for iceberg by @anuunchin in #2671
  • Feat/add athena dabatase location option by @eric-pinkham-rw in #2708
  • motherduck destination config improvement: uppercase env var by @djudjuu in #2703
  • adds parquet support to postgres via adbc by @rudolfix in #2685
  • 2681 - fixes null on non null column arrow by @rudolfix in #2721
  • removes cffi version of psycopg2
  • mssql and snowflake bugfixes by @rudolfix in #2756
  • support for deltalake 1.0 @rudolfix in #2721
  • allows to skip input data deduplication on delete-insert merge to decrease query cost in #2721
  • allows to configure configs and pragmas for duckdb, improves sql_client, tests @rudolfix in #2730
  • logs resolved traces thread-wise, clears log between pipeline runs @rudolfix in #2730

Chores & tech debt
We switch to uv in the coming days and:

  • Simplify workflow files by @sh-rp in #2663
  • fix/2677: remove recursive filewatching by @zilto in #2678
  • QoL: improved __repr__() for public interface by @zilto in #2630
  • fix: incrementally watch files by @zilto in #2697
  • Simplify pipeline test utils by @sh-rp in #2566 (we use data access and dataset for testing now)
  • added constants for load_id col in _dlt_loads table by @zilto in #2729
  • Update github workflow setup by @sh-rp in #2728
  • fixes leaking datasets tests by @rudolfix in #2730

🧪 Upgrades to data access

  • All SQL queries are destination agnostic. For example
  • Column lineage is computed and inferred. x-annotation hints are propagated
  • SqlModel represent SQL query and is processed in extract, normalize and loaded in load step
  • you can use scalar() on data access expressions ie.
# get latest processed package id
max_load_id = pipeline.dataset()._dlt_loads.load_id.max().scalar()

🧪 Cool experimental stuff:

Check out our new embedded pipeline explorer app

dlt pipeline <name> show --marimo
dlt pipeline <name> show --marimo --edit

use edit option to enable Notebook/edit mode in Marimo + very cool Ibis dataset explorer

Docs

  • docs: dlt+ iceberg destination partitioning by @burnash in #2686
  • docs: fix invalid bigquery reference in athena destination by @goober in #2700
  • docs: rest_api: clarify dlt resource and rest_api specific parameters by @burnash in #2710
  • docs: plus: add merge strategies for dlt+ Iceberg destination by @burnash in #2749
  • rest_api: document pagination hierarchy and add tests by @burnash in #2745
  • docs: add session parameter to rest_api client configuration by @burnash in #2746
  • docs: fix incorrect github_source function calls in tutorial by @axelearning in #2768

We updated contribution guidelines

  • By default we do not accept more destinations (except a few like DuckLake or Trino)
  • Each PR needs a test and (possibly) docs entry

New Contributors

Full Changelog: 1.11.0...1.12.0

1.12.0

17 Jun 21:43
f6a8f65
Compare
Choose a tag to compare

Important

We yanked this release from PyPI after discovering that the minimum allowed version of sqlglot could prevent dlt from being imported. This release has been replaced by version 1.12.1, which includes the same release notes.

Core Library

Quality of Life (fixing annoying little things)

  • 2529-INFO_TABLES_QUERY_THRESHOLD-as-paramterer-from-config by @amirdataops in #2600
  • warn when resolving configs or secrets with placeholder values by @djudjuu in #2636
  • Prevent unecessary preliminary connection in dataset by @sh-rp in #2645
  • QoL: warning with hint to provide data types for columns with exclusively None values by @anuunchin in #2633
  • Fix issue 2690: switch to packaging to remove warning on import dlt by @djudjuu in #2707
  • qol: exception formatting by @zilto in #2715
  • Regular and standalone resources are now the same thing. Both provide nice typed callables, allow to be renamed and allow to inject secrets and configs in the same way - also when part of an inner function. This unifies injection behavior for all our decorators.
    In the example below (1) access_token secrets is allowed in inner resource (2) limit argument with default will be injected from ie. LIMIT env variable which was skipped before
@dlt.source
def source():
    @dlt.resource(write_disposition="merge", primary_key="_id")
    def documents(access_token=dlt.secrets.value, limit=10):
        yield from generate_json_like_data(access_token, limit)

    return documents

⚠️ Still we do not recommend to define parametrized inner resources.

  • You can now return data from resources instead of yielding single item. We do not recommend that for code readability.dlt always wraps resources in generators so your return will be converted to yield.
  • To return a DltResource from a resource function you must explicitly type the return value:
@dlt.resource
def rv_resource(name: str) -> DltResource:
    return dlt.resource([1, 2, 3], name=name, primary_key="value")
  • normalizes config resolve behavior: default values can be overridden from providers but explicit cannot.
  • ⚠️ previously, if those were instances of base configurations, behavior was inconsistent (explicit values were treated like defaults).
  • ⚠️ if native value is found for a config and it does not accept native values, config resolution will fail, previously it was ignored
  • We use custom, consistent wrap and unwrap of functions. our decorators preserve both typing and runtime signature of decorated functions. makefun got removed.
  • if Incremental initializes from another Incremental as native value, it copies original type correctly
  • dlt.resource can define configuration section (also using lambdas)

Bugfixes and improvements

  • feat: Expand sql table resource config by @xneg in #2396
  • Added write_disposition to sql table config
  • Added primary_key and merge_key to sql table config
  • Feat: Support clustered tables with custom column order in BigQuery destination by @hsm207 in #2638
  • Feat: Add configuration propagation to deltalake.write_deltalake (#2629) by @gaatjeniksaan in #2640
  • Add support for creating custom integer-range partition table in BigQuery by @hsm207 in #2676
  • Upsert merge strategy for iceberg by @anuunchin in #2671
  • Feat/add athena dabatase location option by @eric-pinkham-rw in #2708
  • motherduck destination config improvement: uppercase env var by @djudjuu in #2703
  • adds parquet support to postgres via adbc by @rudolfix in #2685
  • 2681 - fixes null on non null column arrow by @rudolfix in #2721
  • removes cffi version of psycopg2
  • mssql and snowflake bugfixes by @rudolfix in #2756
  • allows to configure configs and pragmas for duckdb, improves sql_client, tests @rudolfix in #2730
  • logs resolved traces thread-wise, clears log between pipeline runs @rudolfix in #2730

Chores & tech debt
We switch to uv in the coming days and:

  • Simplify workflow files by @sh-rp in #2663
  • fix/2677: remove recursive filewatching by @zilto in #2678
  • QoL: improved __repr__() for public interface by @zilto in #2630
  • fix: incrementally watch files by @zilto in #2697
  • Simplify pipeline test utils by @sh-rp in #2566 (we use data access and dataset for testing now)
  • added constants for load_id col in _dlt_loads table by @zilto in #2729
  • Update github workflow setup by @sh-rp in #2728
  • fixes leaking datasets tests by @rudolfix in #2730

🧪 Upgrades to data access

  • All SQL queries are destination agnostic. For example
  • Column lineage is computed and inferred. x-annotation hints are propagated
  • SqlModel represent SQL query and is processed in extract, normalize and loaded in load step
  • you can use scalar() on data access expressions ie.
# get latest processed package id
max_load_id = pipeline.dataset()._dlt_loads.load_id.max().scalar()

🧪 Cool experimental stuff:

Check out our new embedded pipeline explorer app

dlt pipeline <name> show --marimo
dlt pipeline <name> show --marimo --edit

use edit option to enable Notebook/edit mode in Marimo + very cool Ibis dataset explorer

Docs

  • docs: dlt+ iceberg destination partitioning by @burnash in #2686
  • docs: fix invalid bigquery reference in athena destination by @goober in #2700
  • docs: rest_api: clarify dlt resource and rest_api specific parameters by @burnash in #2710
  • docs: plus: add merge strategies for dlt+ Iceberg destination by @burnash in #2749
  • rest_api: document pagination hierarchy and add tests by @burnash in #2745
  • docs: add session parameter to rest_api client configuration by @burnash in #2746
  • docs: fix incorrect github_source function calls in tutorial by @axelearning in #2768

We updated contribution guidelines

  • By default we do not accept more destinations (except a few like DuckLake or Trino)
  • Each PR needs a test and (possibly) docs entry

New Contributors

Full Changelog: 1.11.0...1.12.0

1.11.0

15 May 11:22
4fc249b
Compare
Choose a tag to compare

Core Library

  • feat: adds iceberg table properties configuration for athena (#2546) by @olexanderos in #2555
  • use pendulum.parse instead of fromisoformat when deserializing pua json to avoid losing timezone info by @sh-rp in #2514
  • #2192 - adds base64 encoded PEM, private_key_path for Snowflake auth, improves docs by @rudolfix in #2569
  • cli: do not use field name as a placeholder name in generated TOML files by @burnash in #2468
  • Speedup CI: Cache google secrets by @sh-rp in #2581
  • Fix: drop corresponding staging table when original table is dropped by @anuunchin in #2567
  • normalize_py_arrow_item now replaces load id column with the right one by @anuunchin in #2526
  • explicit snowflake autocommit=true when connection opens by @rudolfix in #2593
  • list secrets in vault config provider to avoid calls to backend, now it is called only for known keys by @rudolfix in #2597
  • Enabling 'model' loader_file_format for athena, synapse and dremio by @anuunchin in #2556
  • refactor init-command for use in dlt project by @djudjuu in #2568
  • allows to pass config section to dlt.resource, fixes a few edge cases when configs for standalone resources are resolved @rudolfix in #2583
  • enables fsspec per-thread instance cache and updates documentation, this prevents excessive memory usage reported by users by @rudolfix in #2621
  • bumps pendulum to 3.0.1, removes dlt flavored pendulum in Python 3.13 by @rudolfix in #2624
  • Feat/2609-clickhouse-precision-integer-mapping by @hsm207 in #2627
  • removes airflow Airflow DummyOperator import, supports up to v 2.10 by @rudolfix in #2628
  • fixes fsspec instantiation in fs source (where kwargs were ignored) by @rudolfix in #2634

Docs

  • Docs: add documentation for CT columns by @akelad in #2552
  • docs: marimo docs page added by @zilto in #2584
  • athena: fix a typo in the athena_adapter docstring by @burnash in #2599
  • adds proper docs for Google Secrets vault config provider @rudolfix in #2597
  • Extract dataset code snippets into tests snippets system by @sh-rp in #2598
  • fix some typos in cursor-restapi docs by @hsm207 in #2608
  • docs: Fix incorrect nesting in secrets.toml by @agrueneberg in #2614
  • fixes parquet data writer settings docs & rewrites configuration docs by @rudolfix in #2583
  • Added dedup sort example by @dat-a-man in #2235
  • docs: add advanced project tutorial by @sh-rp in #2338
  • docs: split incremental loading page by @burnash in #2592
  • Added info about dlt's internal tables by @dat-a-man in #2525
  • Added section using xmin for Change Data Capture (CDC) to pg_replication docs by @dat-a-man in #2535
  • Added Transform data with add_map documentation by @dat-a-man in #2500
  • cli: fix 404 in the "ai" command output by @burnash in #2643
  • [doc/dlt+] renames sidebar and adds intro snippet for snowflake+ by @rudolfix in #2642

New Contributors

Full Changelog: 1.10.0...1.11.0

1.10.0

22 Apr 18:15
3b332a3
Compare
Choose a tag to compare

Core Library

  • feat(paginator): enhance JSONResponseCursorPaginator to support cursor placement in request JSON body by @kang8 in #2446
  • fix: better import exception for numpy by @zilto in #2397
  • Fix read parquet chunk size by @diwu-sf in #2456
  • rest_api: enhance placeholder expansion to preserve value types by @burnash in #2462
  • list-destination flag added by @zilto in #2441
  • Fixes bigquery, ibis tests by @anuunchin in #2470
  • add ignore_unknown_values option to BigQuery destination by @xxntti3n in #2455
  • Fix: insert-from-staging replace strategy incorrectly using staging-optimized pattern by @anuunchin in #2435
  • fix: index invariant add_row_hash_to_table by @hagelborn in #2491
  • adds condition to check XDG_DATA_HOME to set global_dir value by @goosethedev in #2361
  • removes location tag from athena iceberg, fixes catalog name, allows for additional props @rudolfix in #2478
  • adds default dataset/schema to ibis expressions, support for athena and databricks backends @rudolfix in #2478
  • 2457-refactors iceberg and duckdb cache support by @rudolfix in #2430
  • fixes wrong resolve for WithLocalFiles configuration by @rudolfix in #2430
  • converts Iceberg fileio into dlt credentials by @rudolfix in #2430
  • bumps and simplifies deltalake, enables streaming appends and upserts by @rudolfix in #2430
  • fixes nullability warning on duckdb ALTER by @rudolfix in #2430
  • adds replace strategy selector, internal x-replace-strategy hint, removes sql_params by @rudolfix in #2430
  • borrows and returns sqlalchemy connections in destination and several other critical fixes to sqllite, connection management, merge operations by @rudolfix in #2430
  • executes all sql jobs (replace, copy, merge) in a transaction on all destinations by @rudolfix in #2430
  • shows info on locations for config providers when displaying exceptions, hides warnings when project context is present by @rudolfix in #2430
  • Fix/add s3 region for redshift staging destination related to 2349 by @yannik207 in #2389
  • Fix: sqlalchemy table names of type sqlalchemy.sql.elements.quoted_name causing deserialization error by @anuunchin in
    #2496
  • Preserve type of emtpy lists in incremental to allow to materialize empty resources by @sh-rp in #2511
  • Fix clickhouse sql syntax for sentinel table creation by @sh-rp in #2510
  • cli: dlt ai setup $IDE add cursor rules to REST API source to your dlt project (and more) @zilto in #2503
  • "model" item_format: support for lazily evaluated tables/data frames by @anuunchin in #2423
  • adds arrow ipc nested type encoding to json type by @rudolfix in #2519
  • [chore] imports addon function when generating destination test cases to extend implemented destinations and test cases. allows for followup job in job client tests by @rudolfix in #2519
  • allows iceberg to infer unified schema, allows csv files, uses explicit select list in sql_client filesystem by @rudolfix in #2519
  • allows to use context manager on dataset to keep internal connection open + lifecycle tests by @rudolfix in #2519
  • uses union by name when reading parquet via duckdb to see all columns when schema evolves by @rudolfix in #2519
  • makes vault provider to check known toml fragments also for non-secrets by @rudolfix in #2519
  • handles non existing iceberg tables in sql client and during registration by @rudolfix in #2519
  • refactors how deterministic 'temp' tables are named for destinations that do not support temp tables by @rudolfix in #2519
  • fixes open cursor not closed in mssql and athena sql_client by @rudolfix in #2519

Docs

Verified Sources (and rules)

We support dlt ai command via https://github.com/dlt-hub/verified-sources/tree/master/ai

New Contributors

Full Changelog: 1.9.0...1.10.0

1.9.0

26 Mar 18:02
3577222
Compare
Choose a tag to compare

Core Library

  • Apply hints for nested tables: set data types, data contract via apply_hints or resource decorator by @steinitzu @rudolfix @zilto in #2165
  • Apply hints for nested tables: convert nested tables into root tables by setting primary/merge keys and write disposition by @steinitzu @rudolfix @zilto in #2165
  • rest_api: interpolate json and headers for top resources by @burnash in #2437
  • rest_api: added parametric expressions for headers by @francescomucio in #2262
  • feat(rest_client): allow specifying headers on a per-request basis by @joscha in #2434
  • rest_api: add tests for handling escaped braces in string interpolation by @burnash in #2416
  • Fix/133 ibis iceberg delta by @anuunchin in #2371
  • Adding sql dialect to destination capabilities by @anuunchin in #2393
  • un-deprecates force_iceberg on athena via new destination cap settings by @rudolfix in #2417
  • (bugfix) correctly handles bucket url with path for Snowflake S3 Stage + copy command generator refactor by @stevenayers in #2354
  • Optional engine_adapter_callback by @michelzurkirchen in #2427

Docs

Verified Sources

New Contributors

Full Changelog: 1.8.1...1.9.0