Feature description
dlt's DuckLake implementation seems to require that the storage bucket_url be set and match the data_path in the catalog unless override_data_path is used (#3703). This is counterintuitive and makes it really hard to use an existing DuckLake catalog that is managed by another system where I don't know what the data_path option is set to ahead of time.
The data_path parameter is only required if a new catalog is being created. Otherwise, the data_path set in the catalog's options is used, unless override_data_path is specified. DuckLake should not require that data_path is set, nor require that it match whatever's in the catalog options.
Docs: Connecting — DuckLake
Error when setting a dummybucket_url.
<class 'dlt.destinations.exceptions.DestinationConnectionError'>
Connection with client_type=DuckLakeSqlClient to dataset_name=ghgsat_data failed. Please check if you configured the credentials at all and provided the right credentials values. You can be also denied access or your internet connection may be down. The actual reason given is: DATA_PATH parameter "s3://example/" does not match existing data path in the catalog "s3://dklk-cam-king-cam-main--usw2-az1--x-s3/king_cam_main/".
You can override the DATA_PATH by setting OVERRIDE_DATA_PATH to True.
Error when bucket_url is not set.
<class 'dlt.destinations.exceptions.DestinationConnectionError'>
Connection with client_type=DuckLakeSqlClient to dataset_name=ghgsat_data failed. Please check if you configured the credentials at all and provided the right credentials values. You can be also denied access or your internet connection may be down. The actual reason given is: DATA_PATH parameter "/Users/king/Repositories/github.com/SensorUp/-sdp-defs/sdp-ghgsat-defs/king_cam_main.files/" does not match existing data path in the catalog "s3://dklk-cam-king-cam-main--usw2-az1--x-s3/king_cam_main/".
You can override the DATA_PATH by setting OVERRIDE_DATA_PATH to True.
Are you a dlt user?
Yes, I'm already a dlt user.
Use case
I want to use an existing DuckLake catalog as a target, where I don't know what the catalog's data_path option is set to. I should not have to set it, nor should dlt check its value.
Proposed solution
Don't pass a data_path argument during attach if it's not set, and don't check if it's the same value as the catalog's data_path. The argument is only pertinent if a new catalog being created or if override_data_path is used.
Related issues
No response
Feature description
dlt's DuckLake implementation seems to require that the storage
bucket_urlbe set and match thedata_pathin the catalog unlessoverride_data_pathis used (#3703). This is counterintuitive and makes it really hard to use an existing DuckLake catalog that is managed by another system where I don't know what thedata_pathoption is set to ahead of time.The
data_pathparameter is only required if a new catalog is being created. Otherwise, thedata_pathset in the catalog's options is used, unlessoverride_data_pathis specified. DuckLake should not require thatdata_pathis set, nor require that it match whatever's in the catalog options.Docs: Connecting — DuckLake
Error when setting a dummy
bucket_url.Error when
bucket_urlis not set.Are you a dlt user?
Yes, I'm already a dlt user.
Use case
I want to use an existing DuckLake catalog as a target, where I don't know what the catalog's
data_pathoption is set to. I should not have to set it, nor should dlt check its value.Proposed solution
Don't pass a
data_pathargument during attach if it's not set, and don't check if it's the same value as the catalog'sdata_path. The argument is only pertinent if a new catalog being created or ifoverride_data_pathis used.Related issues
No response