Skip to content

Commit a4e3b06

Browse files
Updated the adjust a schema and sql configuration docs (#2387)
* Update the adjust a schema doc * Updated the condiguration section for source arguments for `sql_database` * Updated adjust a schema * Updated * Update docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md --------- Co-authored-by: Alena Astrakhantseva <[email protected]>
1 parent 868216c commit a4e3b06

File tree

3 files changed

+64
-7
lines changed

3 files changed

+64
-7
lines changed

docs/website/docs/dlt-ecosystem/destinations/postgres.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ To pass credentials directly, use the [explicit instance of the destination](../
7070
pipeline = dlt.pipeline(
7171
pipeline_name='chess',
7272
destination=dlt.destinations.postgres("postgresql://loader:<password>@localhost/dlt_data"),
73-
dataset_name='chess_data'
73+
dataset_name='chess_data' #your destination schema name
7474
)
7575
```
7676

docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,10 @@ import Header from '../_source-info-header.md';
1818

1919
Read more about sources and resources here: [General usage: source](../../../general-usage/source.md) and [General usage: resource](../../../general-usage/resource.md).
2020

21+
:::note NOTE
22+
To see complete list of source arguments for `sql_database` [refer to the this section](#arguments-for-sql_database-source).
23+
:::
24+
2125
### Example usage:
2226

2327
:::tip
@@ -408,3 +412,55 @@ print(info)
408412
```
409413
With the dataset above and a local PostgreSQL instance, the `ConnectorX` backend is 2x faster than the `PyArrow` backend.
410414

415+
## Arguments for `sql_database` source
416+
The following arguments can be used with the `sql_database` source:
417+
418+
`credentials` (Union[ConnectionStringCredentials, Engine, str]): Database credentials or an `sqlalchemy.Engine` instance.
419+
420+
`schema` (Optional[str]): Name of the database schema to load (if different from default).
421+
422+
`metadata` (Optional[MetaData]): Optional `sqlalchemy.MetaData` instance. `schema` argument is ignored when this is used.
423+
424+
`table_names` (Optional[List[str]]): A list of table names to load. By default, all tables in the schema are loaded.
425+
426+
`chunk_size` (int): Number of rows yielded in one batch. SQL Alchemy will create additional internal rows buffer twice the chunk size.
427+
428+
`backend` (TableBackend): Type of backend to generate table data. One of: "sqlalchemy", "pyarrow", "pandas" and "connectorx".
429+
430+
- "sqlalchemy" yields batches as lists of Python dictionaries, "pyarrow" and "connectorx" yield batches as arrow tables, "pandas" yields panda frames.
431+
432+
- "sqlalchemy" is the default and does not require additional dependencies,
433+
434+
- "pyarrow" creates stable destination schemas with correct data types,
435+
436+
- "connectorx" is typically the fastest but ignores the "chunk_size" so you must deal with large tables yourself.
437+
438+
`detect_precision_hints` (bool): Deprecated. Use `reflection_level`. Set column precision and scale hints for supported data types in the target schema based on the columns in the source tables. This is disabled by default.
439+
440+
`reflection_level`: (ReflectionLevel): Specifies how much information should be reflected from the source database schema.
441+
442+
- "minimal": Only table names, nullability and primary keys are reflected. Data types are inferred from the data. This is the default option.
443+
444+
- "full": Data types will be reflected on top of "minimal". `dlt` will coerce the data into reflected types if necessary.
445+
446+
- "full_with_precision": Sets precision and scale on supported data types (ie. decimal, text, binary). Creates big and regular integer types.
447+
448+
`defer_table_reflect` (bool): Will connect and reflect table schema only when yielding data. Requires table_names to be explicitly passed.
449+
Enable this option when running on Airflow. Available on dlt 0.4.4 and later.
450+
451+
`table_adapter_callback`: (Callable): Receives each reflected table. May be used to modify the list of columns that will be selected.
452+
453+
`backend_kwargs` (**kwargs): kwargs passed to table backend ie. "conn" is used to pass specialized connection string to connectorx.
454+
455+
`include_views` (bool): Reflect views as well as tables. Note view names included in `table_names` are always included regardless of this setting. This is set to false by default.
456+
457+
`type_adapter_callback`(Optional[Callable]): Callable to override type inference when reflecting columns.
458+
Argument is a single sqlalchemy data type (`TypeEngine` instance) and it should return another sqlalchemy data type, or `None` (type will be inferred from data)
459+
460+
`query_adapter_callback`(Optional[Callable[Select, Table], Select]): Callable to override the SELECT query used to fetch data from the table. The callback receives the sqlalchemy `Select` and corresponding `Table`, 'Incremental` and `Engine` objects and should return the modified `Select` or `Text`.
461+
462+
`resolve_foreign_keys` (bool): Translate foreign keys in the same schema to `references` table hints.
463+
May incur additional database calls as all referenced tables are reflected.
464+
465+
`engine_adapter_callback` (Callable[[Engine], Engine]): Callback to configure, modify and Engine instance that will be used to open a connection ie. to set transaction isolation level.
466+

docs/website/docs/walkthroughs/adjust-a-schema.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,8 @@ schemas
3636
|---export/
3737
```
3838

39-
Rather than providing the paths in the `dlt.pipeline` function, you can also set them
40-
in the `config.toml` file:
39+
Rather than providing the paths in the `dlt.pipeline` function, you can also set them at
40+
the beginning of the `config.toml` file:
4141

4242
```toml
4343
export_schema_path="schemas/export"
@@ -74,10 +74,11 @@ You should keep the import schema as simple as possible and let `dlt` do the res
7474
In the next steps, we'll experiment a lot; you will be warned to set `dev_mode=True` until we are done experimenting.
7575

7676
:::caution
77-
`dlt` will **not modify** tables after they are created.
78-
So if you have a YAML file, and you change it (e.g., change a data type or add a hint),
79-
then you need to **delete the dataset**
80-
or set `dev_mode=True`:
77+
dlt does **not modify** existing columns in a table after creation. While new columns can be added, changes to existing
78+
columns (such as altering data types or adding hints) will not take effect automatically.
79+
80+
If you modify a YAML schema file, you must either delete the dataset, enable `dev_mode=True`, or use one of the Pipeline
81+
[Refresh options](../general-usage/pipeline#refresh-pipeline-data-and-state) to apply the changes.
8182
```py
8283
dlt.pipeline(
8384
import_schema_path="schemas/import",

0 commit comments

Comments
 (0)