Skip to content

feat(destination): add Hotdata managed-database destination#4013

Open
eddietejeda wants to merge 2 commits into
dlt-hub:develfrom
hotdata-dev:feat/hotdata-destination
Open

feat(destination): add Hotdata managed-database destination#4013
eddietejeda wants to merge 2 commits into
dlt-hub:develfrom
hotdata-dev:feat/hotdata-destination

Conversation

@eddietejeda

Copy link
Copy Markdown

Adds Hotdata as a first-party dlt destination:

dlt.destinations.hotdata

Hotdata is a managed-database service that accepts Parquet uploads via HTTP API.

This destination implements JobClientBase and WithStateSync, the same interface as other non-SQL destinations, so it works with existing pipelines without code changes.

Feature Support

Feature Status Notes
replace βœ… Full table re-upload
append βœ… Permissive concat; schema drift handled
merge / upsert βœ… Client-side upsert by primary key
insert-only βœ… Insert when not matched; existing rows untouched
truncate-and-insert βœ… Declared replace strategy
Table nesting βœ… Unlimited (1000), configurable per destination
dlt metadata columns βœ… _dlt_id, _dlt_load_id, _dlt_parent_id, etc. preserved
Pipeline state sync βœ… WithStateSync; survives across runs
Schema versioning βœ… Stored in _dlt_version managed table
Load tracking βœ… Stored in _dlt_loads managed table
Auto-create database βœ… Configurable via create_database_if_missing
Schema evolution βœ… Delete and recreate DB with union of old and new tables
Retry / backoff βœ… Configurable retries, exponential backoff capped at 30 seconds
Error classification βœ… Transient 408, 409, 425, 429, 5xx vs terminal
Parallelism strategy βœ… Table-sequential default, configurable
Max table nesting βœ… Configurable per destination instance
Identifier normalisation βœ… snake_case convention: [a-z0-9_]; nested tables as parent__child
Dataset read API ❌ Requires SqlJobClientBase
SCD2 ❌ Requires server-side SQL
Staging area ❌ No Hotdata staging concept
Type mapper ❌ Not needed; Parquet carries its own types
Clone table ❌ No API endpoint

New Files

dlt/destinations/impl/hotdata/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ _api_client.py       # Hotdata SDK wrapper with retry logic
β”œβ”€β”€ configuration.py     # HotdataCredentials + HotdataClientConfiguration
β”œβ”€β”€ contracts.py         # Identifier normalisation, TableContract
β”œβ”€β”€ errors.py            # Error classification: transient vs terminal
β”œβ”€β”€ factory.py           # hotdata(Destination[...]) + capabilities
β”œβ”€β”€ hotdata.py           # HotdataClient + HotdataLoadJob
β”œβ”€β”€ merge.py             # combine_tables for all write dispositions
└── parquet.py           # Arrow β†’ Parquet writer
tests/load/hotdata/
└── test_hotdata_client.py   # 44 unit tests

Related Issues

N/A β€” new destination.

Additional Context

  • Merge is executed client-side: fetch existing data, combine in Arrow, then re-upload. Server-side merge is planned for the next Hotdata API release and will eliminate the need for the fetch step.
  • loader_parallelism_strategy defaults to table-sequential to prevent concurrent read-modify-write races on the same table.
  • The Hotdata API requires tables to be declared at database creation time. When a new table appears mid-pipeline, the destination deletes and recreates the database with the union of existing and new declared tables, preserving all data.

Ports the hotdata destination into dlt as a first-party destination.
Hotdata uses parquet uploads to a managed database API with client-side
merge logic implemented in Python/Arrow.

Write dispositions: replace, append, merge, upsert, insert-only
Replace strategies: truncate-and-insert
Merge strategies: upsert, insert-only (dedup via _dlt_id fallback)
Table nesting: unlimited (max_table_nesting=1000, configurable)
State sync: WithStateSync β€” pipeline state, schema versioning, load tracking
Metadata: all dlt columns preserved across user and internal tables
Retry: exponential backoff with transient/terminal error classification
Schema evolution: auto-recreates managed database with union of tables
runtimedb.local enforces lowercase [a-z0-9_] identifiers with __ as the
nested table separator β€” exactly snake_case semantics. The direct convention
passes identifiers through unchanged and uses β–Ά as the separator, which the
hotdata API rejects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant