Missing numpy dependency when running with PyArrow >= 18 #2380

kien-truong · 2025-03-06T02:59:55Z

dlt version

1.7

Describe the problem

Because PyArrow >= 18 moves numpy to optional runtime dependencies, when running pipelines with SQL sources and pyarrow backend, it'll fail with an import error

ModuleNotFoundError: No module named 'numpy'

Expected behavior

Pipelines using pyarrow backend works with pyarrow >= 18

Steps to reproduce

Install dlt with pyarrow >= 18
Run a dlt pipeline using SQL source and pyarrow backend

Operating system

Linux

Runtime environment

Local

Python version

3.12

dlt data source

Any SQL source

dlt destination

No response

Other deployment details

No response

Additional information

No response

The text was updated successfully, but these errors were encountered:

rudolfix · 2025-03-10T10:05:14Z

@kien-truong thanks for pointing this. numpy should be optional and dlt should not raise here. could you paste the full stack trace:

note on how to fix this:

make sure sql_database works without numpy (arrow backend). there's must be a top level import somewhere after refactor!
add numpy to arrow extras explicitly

kien-truong · 2025-03-10T11:05:33Z

My stacktrace are mangled, but the exception comes from this line

dlt/dlt/common/libs/pyarrow.py

Lines 726 to 733 in 7b3c51e

    
           def transpose_rows_to_columns( 
        
               rows: TDataItems, column_names: Iterable[str] 
        
           ) -> dict[str, Any]:  # dict[str, np.ndarray] 
        
               """Transpose rows (data items) into columns (numpy arrays). Returns a dictionary of {column_name: column_data} 
        
               Uses pandas if available. Otherwise, use numpy, which is slower 
        
               """ 
        
               import numpy as np

github-project-automation bot added this to dlt core library Mar 6, 2025

github-project-automation bot moved this to Todo in dlt core library Mar 6, 2025

rudolfix moved this from Todo to Planned in dlt core library Mar 10, 2025

rudolfix added the bug Something isn't working label Mar 10, 2025

rudolfix assigned zilto Mar 10, 2025

zilto linked a pull request Mar 11, 2025 that will close this issue

fix: better import exception for numpy #2397

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing numpy dependency when running with PyArrow >= 18 #2380

Missing numpy dependency when running with PyArrow >= 18 #2380

kien-truong commented Mar 6, 2025

rudolfix commented Mar 10, 2025 •

edited

Loading

kien-truong commented Mar 10, 2025

Missing numpy dependency when running with PyArrow >= 18 #2380

Missing numpy dependency when running with PyArrow >= 18 #2380

Comments

kien-truong commented Mar 6, 2025

dlt version

Describe the problem

Expected behavior

Steps to reproduce

Operating system

Runtime environment

Python version

dlt data source

dlt destination

Other deployment details

Additional information

rudolfix commented Mar 10, 2025 • edited Loading

kien-truong commented Mar 10, 2025

rudolfix commented Mar 10, 2025 •

edited

Loading