Updated the adjust a schema and sql configuration docs (#2387)

dat-a-man · AstrakhantsevaAA · web-flow · commit a4e3b0666258 · 2025-03-12T17:58:17.000+01:00
* Update the adjust a schema doc

* Updated the condiguration section for source arguments for `sql_database`

* Updated adjust a schema

* Updated

* Update docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md

---------

Co-authored-by: Alena Astrakhantseva &lt;alena@dlthub.com&gt;
diff --git a/docs/website/docs/dlt-ecosystem/destinations/postgres.md b/docs/website/docs/dlt-ecosystem/destinations/postgres.md
@@ -70,7 +70,7 @@ To pass credentials directly, use the [explicit instance of the destination](../
 pipeline = dlt.pipeline(
   pipeline_name='chess',
   destination=dlt.destinations.postgres("postgresql://loader:<password>@localhost/dlt_data"),
-  dataset_name='chess_data'
+  dataset_name='chess_data' #your destination schema name
 )
 ```
 
diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database/configuration.md
@@ -18,6 +18,10 @@ import Header from '../_source-info-header.md';
 
 Read more about sources and resources here: [General usage: source](../../../general-usage/source.md) and [General usage: resource](../../../general-usage/resource.md).
 
+:::note NOTE
+To see complete list of source arguments for `sql_database` [refer to the this section](#arguments-for-sql_database-source).
+:::
+
 ### Example usage:
 
 :::tip
@@ -408,3 +412,55 @@ print(info)
 ```
 With the dataset above and a local PostgreSQL instance, the `ConnectorX` backend is 2x faster than the `PyArrow` backend.
 
+## Arguments for `sql_database` source
+The following arguments can be used with the `sql_database` source:
+    
+    `credentials` (Union[ConnectionStringCredentials, Engine, str]): Database credentials or an `sqlalchemy.Engine` instance.
+    
+    `schema` (Optional[str]): Name of the database schema to load (if different from default).
+    
+    `metadata` (Optional[MetaData]): Optional `sqlalchemy.MetaData` instance. `schema` argument is ignored when this is used.
+    
+    `table_names` (Optional[List[str]]): A list of table names to load. By default, all tables in the schema are loaded.
+    
+    `chunk_size` (int): Number of rows yielded in one batch. SQL Alchemy will create additional internal rows buffer twice the chunk size.
+    
+    `backend` (TableBackend): Type of backend to generate table data. One of: "sqlalchemy", "pyarrow", "pandas" and "connectorx".
+
+        - "sqlalchemy" yields batches as lists of Python dictionaries, "pyarrow" and "connectorx" yield batches as arrow tables, "pandas" yields panda frames.
+
+        - "sqlalchemy" is the default and does not require additional dependencies, 
+
+        - "pyarrow" creates stable destination schemas with correct data types,
+
+        - "connectorx" is typically the fastest but ignores the "chunk_size" so you must deal with large tables yourself.
+    
+    `detect_precision_hints` (bool): Deprecated. Use `reflection_level`. Set column precision and scale hints for supported data types in the target schema based on the columns in the source tables. This is disabled by default.
+    
+    `reflection_level`: (ReflectionLevel): Specifies how much information should be reflected from the source database schema.
+
+        - "minimal": Only table names, nullability and primary keys are reflected. Data types are inferred from the data. This is the default option.
+
+        - "full": Data types will be reflected on top of "minimal". `dlt` will coerce the data into reflected types if necessary.
+
+        - "full_with_precision": Sets precision and scale on supported data types (ie. decimal, text, binary). Creates big and regular integer types.
+    
+    `defer_table_reflect` (bool): Will connect and reflect table schema only when yielding data. Requires table_names to be explicitly passed.
+        Enable this option when running on Airflow. Available on dlt 0.4.4 and later.
+    
+    `table_adapter_callback`: (Callable): Receives each reflected table. May be used to modify the list of columns that will be selected.
+    
+    `backend_kwargs` (**kwargs): kwargs passed to table backend ie. "conn" is used to pass specialized connection string to connectorx.
+    
+    `include_views` (bool): Reflect views as well as tables. Note view names included in `table_names` are always included regardless of this setting. This is set to false by default.
+    
+    `type_adapter_callback`(Optional[Callable]): Callable to override type inference when reflecting columns.
+        Argument is a single sqlalchemy data type (`TypeEngine` instance) and it should return another sqlalchemy data type, or `None` (type will be inferred from data)
+    
+    `query_adapter_callback`(Optional[Callable[Select, Table], Select]): Callable to override the SELECT query used to fetch data from the table. The callback receives the sqlalchemy `Select` and corresponding `Table`, 'Incremental` and `Engine` objects and should return the modified `Select` or `Text`.
+    
+    `resolve_foreign_keys` (bool): Translate foreign keys in the same schema to `references` table hints.
+        May incur additional database calls as all referenced tables are reflected.
+    
+    `engine_adapter_callback` (Callable[[Engine], Engine]): Callback to configure, modify and Engine instance that will be used to open a connection ie. to set transaction isolation level.
+
diff --git a/docs/website/docs/walkthroughs/adjust-a-schema.md b/docs/website/docs/walkthroughs/adjust-a-schema.md
@@ -36,8 +36,8 @@ schemas
     |---export/
 ```
 
-Rather than providing the paths in the `dlt.pipeline` function, you can also set them
-in the `config.toml` file:
+Rather than providing the paths in the `dlt.pipeline` function, you can also set them at 
+the beginning of the `config.toml` file:
 
 ```toml
 export_schema_path="schemas/export"
@@ -74,10 +74,11 @@ You should keep the import schema as simple as possible and let `dlt` do the res
 In the next steps, we'll experiment a lot; you will be warned to set `dev_mode=True` until we are done experimenting.
 
 :::caution
-`dlt` will **not modify** tables after they are created.
-So if you have a YAML file, and you change it (e.g., change a data type or add a hint),
-then you need to **delete the dataset**
-or set `dev_mode=True`:
+dlt does **not modify** existing columns in a table after creation. While new columns can be added, changes to existing 
+columns (such as altering data types or adding hints) will not take effect automatically.
+
+If you modify a YAML schema file, you must either delete the dataset, enable `dev_mode=True`, or use one of the Pipeline 
+[Refresh options](../general-usage/pipeline#refresh-pipeline-data-and-state) to apply the changes.
 ```py
 dlt.pipeline(
     import_schema_path="schemas/import",

Original file line number	Diff line number	Diff line change
`@@ -70,7 +70,7 @@ To pass credentials directly, use the [explicit instance of the destination](../`
`70`	`70`	`pipeline = dlt.pipeline(`
`71`	`71`	`pipeline_name='chess',`
`72`	`72`	`destination=dlt.destinations.postgres("postgresql://loader:<password>@localhost/dlt_data"),`
`73`		`- dataset_name='chess_data'`
	`73`	`+ dataset_name='chess_data' #your destination schema name`
`74`	`74`	`)`
`75`	`75`	```
`76`	`76`