Skip to content

Inability to run conda-unpack in Spark environment makes S3 access via pyarrow broken #442

@kopczynski-9livesdata

Description

@kopczynski-9livesdata

Checklist

  • I added a descriptive title
  • I searched open reports and couldn't find a duplicate

What happened?

S3 access via pyarrow's filesystem doesn't work when executed in a Spark environment created via conda-pack. For simplicity, my example code below doesn't use Spark but the behavior is the same.

I'm trying to run this code:

from pyarrow.fs import S3FileSystem
import ssl

print(ssl.get_default_verify_paths())
print(S3FileSystem.from_uri("s3://s3-bucket/"))

I'm creating a conda package like so:

(base) bash-4.4# conda create --name pyarrow_conda_from_docker -c conda-forge conda-pack pyarrow python=3.10
Channels:
 - conda-forge
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /root/miniconda3/envs/pyarrow_conda_from_docker

  added / updated specs:
    - conda-pack
    - pyarrow
    - python=3.10


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |      conda_forge           3 KB  conda-forge
    _openmp_mutex-4.5          |            2_gnu          23 KB  conda-forge
    aws-c-auth-0.9.1           |       h48c9088_3         120 KB  conda-forge
    aws-c-cal-0.9.2            |       he7b75e1_1          50 KB  conda-forge
    aws-c-common-0.12.4        |       hb03c661_0         231 KB  conda-forge
    aws-c-compression-0.3.1    |       h92c474e_6          22 KB  conda-forge
    aws-c-event-stream-0.5.6   |       h82d11aa_3          58 KB  conda-forge
    aws-c-http-0.10.4          |       h94feff3_3         219 KB  conda-forge
    aws-c-io-0.22.0            |       h57f3b0d_1         177 KB  conda-forge
    aws-c-mqtt-0.13.3          |       h2b1cf8c_6         211 KB  conda-forge
    aws-c-s3-0.8.6             |       h4e5ac4b_5         134 KB  conda-forge
    aws-c-sdkutils-0.2.4       |       h92c474e_1          58 KB  conda-forge
    aws-checksums-0.2.7        |       h92c474e_2          75 KB  conda-forge
    aws-crt-cpp-0.34.4         |       h60c762c_0         399 KB  conda-forge
    aws-sdk-cpp-1.11.606       |       h32384e2_4         3.2 MB  conda-forge
    azure-core-cpp-1.16.0      |       h3a458e0_1         344 KB  conda-forge
    azure-identity-cpp-1.12.0  |       ha729027_0         236 KB  conda-forge
    azure-storage-blobs-cpp-12.14.0|       hb1c9500_1         564 KB  conda-forge
    azure-storage-common-cpp-12.10.0|       h4bb41a7_3         146 KB  conda-forge
    azure-storage-files-datalake-cpp-12.12.0|       h8b27e44_3         293 KB  conda-forge
    bzip2-1.0.8                |       hda65f42_8         254 KB  conda-forge
    c-ares-1.34.5              |       hb9d3cd8_0         202 KB  conda-forge
    ca-certificates-2025.10.5  |       hbd8a1cb_0         152 KB  conda-forge
    conda-pack-0.8.1           |     pyhd8ed1ab_1          34 KB  conda-forge
    gflags-2.2.2               |    h5888daf_1005         117 KB  conda-forge
    glog-0.7.1                 |       hbabe93e_0         140 KB  conda-forge
    icu-75.1                   |       he02047a_0        11.6 MB  conda-forge
    keyutils-1.6.3             |       hb9d3cd8_0         131 KB  conda-forge
    krb5-1.21.3                |       h659f571_0         1.3 MB  conda-forge
    ld_impl_linux-64-2.44      |       ha97dd6f_2         730 KB  conda-forge
    libabseil-20250512.1       | cxx17_hba17884_0         1.2 MB  conda-forge
    libarrow-21.0.0            |   h56a6dad_8_cpu         5.9 MB  conda-forge
    libarrow-acero-21.0.0      |   h635bf11_8_cpu         568 KB  conda-forge
    libarrow-compute-21.0.0    |   h8c2c5c3_8_cpu         2.9 MB  conda-forge
    libarrow-dataset-21.0.0    |   h635bf11_8_cpu         566 KB  conda-forge
    libarrow-substrait-21.0.0  |   h3f74fd7_8_cpu         472 KB  conda-forge
    libbrotlicommon-1.1.0      |       hb03c661_4          68 KB  conda-forge
    libbrotlidec-1.1.0         |       hb03c661_4          33 KB  conda-forge
    libbrotlienc-1.1.0         |       hb03c661_4         283 KB  conda-forge
    libcrc32c-1.1.2            |       h9c3ff4c_0          20 KB  conda-forge
    libcurl-8.14.1             |       h332b0f4_0         439 KB  conda-forge
    libedit-3.1.20250104       | pl5321h7949ede_0         132 KB  conda-forge
    libev-4.33                 |       hd590300_2         110 KB  conda-forge
    libevent-2.1.12            |       hf998b51_1         417 KB  conda-forge
    libexpat-2.7.1             |       hecca717_0          73 KB  conda-forge
    libffi-3.4.6               |       h2dba641_1          56 KB  conda-forge
    libgcc-15.2.0              |       h767d61c_7         803 KB  conda-forge
    libgcc-ng-15.2.0           |       h69a702a_7          29 KB  conda-forge
    libgomp-15.2.0             |       h767d61c_7         437 KB  conda-forge
    libgoogle-cloud-2.39.0     |       hdb79228_0         1.2 MB  conda-forge
    libgoogle-cloud-storage-2.39.0|       hdbdcf42_0         785 KB  conda-forge
    libgrpc-1.73.1             |       h1e535eb_0         8.0 MB  conda-forge
    libiconv-1.18              |       h3b78370_2         772 KB  conda-forge
    liblzma-5.8.1              |       hb9d3cd8_2         110 KB  conda-forge
    libnghttp2-1.67.0          |       had1ee68_0         651 KB  conda-forge
    libnsl-2.0.1               |       hb9d3cd8_1          33 KB  conda-forge
    libopentelemetry-cpp-1.21.0|       hb9b0907_1         865 KB  conda-forge
    libopentelemetry-cpp-headers-1.21.0|       ha770c72_1         355 KB  conda-forge
    libparquet-21.0.0          |   h790f06f_8_cpu         1.3 MB  conda-forge
    libprotobuf-6.31.1         |       h9ef548d_1         3.8 MB  conda-forge
    libre2-11-2025.08.12       |       h7b12aa8_1         206 KB  conda-forge
    libsqlite-3.50.4           |       h0c1763c_0         911 KB  conda-forge
    libssh2-1.11.1             |       hcf80075_0         298 KB  conda-forge
    libstdcxx-15.2.0           |       h8f9b012_7         3.7 MB  conda-forge
    libstdcxx-ng-15.2.0        |       h4852527_7          29 KB  conda-forge
    libthrift-0.22.0           |       h454ac66_1         414 KB  conda-forge
    libutf8proc-2.11.0         |       hb04c3b8_0          84 KB  conda-forge
    libuuid-2.41.2             |       he9a06e4_0          36 KB  conda-forge
    libxcrypt-4.4.36           |       hd590300_1          98 KB  conda-forge
    libxml2-2.15.0             |       h26afc86_1          44 KB  conda-forge
    libxml2-16-2.15.0          |       ha9997c6_1         543 KB  conda-forge
    libzlib-1.3.1              |       hb9d3cd8_2          60 KB  conda-forge
    lz4-c-1.10.0               |       h5888daf_1         163 KB  conda-forge
    ncurses-6.5                |       h2d0b736_3         871 KB  conda-forge
    nlohmann_json-3.12.0       |       h54a6638_1         133 KB  conda-forge
    openssl-3.5.4              |       h26f9b46_0         3.0 MB  conda-forge
    orc-2.2.1                  |       hd747db4_0         1.3 MB  conda-forge
    pip-25.2                   |     pyh8b19718_0         1.1 MB  conda-forge
    prometheus-cpp-1.3.0       |       ha5d0236_0         195 KB  conda-forge
    pyarrow-21.0.0             |  py310hff52083_1          26 KB  conda-forge
    pyarrow-core-21.0.0        |py310h923f568_1_cpu         5.0 MB  conda-forge
    python-3.10.18             |hd6af730_0_cpython        23.9 MB  conda-forge
    python_abi-3.10            |          8_cp310           7 KB  conda-forge
    re2-2025.08.12             |       h5301d42_1          27 KB  conda-forge
    readline-8.2               |       h8c095d6_2         276 KB  conda-forge
    s2n-1.5.26                 |       h5ac9029_0         382 KB  conda-forge
    setuptools-80.9.0          |     pyhff2d567_0         731 KB  conda-forge
    snappy-1.2.2               |       h03e3b7b_0          45 KB  conda-forge
    tk-8.6.13                  |noxft_hd72426e_102         3.1 MB  conda-forge
    tzdata-2025b               |       h78e105d_0         120 KB  conda-forge
    wheel-0.45.1               |     pyhd8ed1ab_1          61 KB  conda-forge
    zlib-1.3.1                 |       hb9d3cd8_2          90 KB  conda-forge
    zstd-1.5.7                 |       hb8e6e7a_2         554 KB  conda-forge
    ------------------------------------------------------------
                                           Total:       100.7 MB

The following NEW packages will be INSTALLED:

  _libgcc_mutex      conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge 
  _openmp_mutex      conda-forge/linux-64::_openmp_mutex-4.5-2_gnu 
  aws-c-auth         conda-forge/linux-64::aws-c-auth-0.9.1-h48c9088_3 
  aws-c-cal          conda-forge/linux-64::aws-c-cal-0.9.2-he7b75e1_1 
  aws-c-common       conda-forge/linux-64::aws-c-common-0.12.4-hb03c661_0 
  aws-c-compression  conda-forge/linux-64::aws-c-compression-0.3.1-h92c474e_6 
  aws-c-event-stream conda-forge/linux-64::aws-c-event-stream-0.5.6-h82d11aa_3 
  aws-c-http         conda-forge/linux-64::aws-c-http-0.10.4-h94feff3_3 
  aws-c-io           conda-forge/linux-64::aws-c-io-0.22.0-h57f3b0d_1 
  aws-c-mqtt         conda-forge/linux-64::aws-c-mqtt-0.13.3-h2b1cf8c_6 
  aws-c-s3           conda-forge/linux-64::aws-c-s3-0.8.6-h4e5ac4b_5 
  aws-c-sdkutils     conda-forge/linux-64::aws-c-sdkutils-0.2.4-h92c474e_1 
  aws-checksums      conda-forge/linux-64::aws-checksums-0.2.7-h92c474e_2 
  aws-crt-cpp        conda-forge/linux-64::aws-crt-cpp-0.34.4-h60c762c_0 
  aws-sdk-cpp        conda-forge/linux-64::aws-sdk-cpp-1.11.606-h32384e2_4 
  azure-core-cpp     conda-forge/linux-64::azure-core-cpp-1.16.0-h3a458e0_1 
  azure-identity-cpp conda-forge/linux-64::azure-identity-cpp-1.12.0-ha729027_0 
  azure-storage-blo~ conda-forge/linux-64::azure-storage-blobs-cpp-12.14.0-hb1c9500_1 
  azure-storage-com~ conda-forge/linux-64::azure-storage-common-cpp-12.10.0-h4bb41a7_3 
  azure-storage-fil~ conda-forge/linux-64::azure-storage-files-datalake-cpp-12.12.0-h8b27e44_3 
  bzip2              conda-forge/linux-64::bzip2-1.0.8-hda65f42_8 
  c-ares             conda-forge/linux-64::c-ares-1.34.5-hb9d3cd8_0 
  ca-certificates    conda-forge/noarch::ca-certificates-2025.10.5-hbd8a1cb_0 
  conda-pack         conda-forge/noarch::conda-pack-0.8.1-pyhd8ed1ab_1 
  gflags             conda-forge/linux-64::gflags-2.2.2-h5888daf_1005 
  glog               conda-forge/linux-64::glog-0.7.1-hbabe93e_0 
  icu                conda-forge/linux-64::icu-75.1-he02047a_0 
  keyutils           conda-forge/linux-64::keyutils-1.6.3-hb9d3cd8_0 
  krb5               conda-forge/linux-64::krb5-1.21.3-h659f571_0 
  ld_impl_linux-64   conda-forge/linux-64::ld_impl_linux-64-2.44-ha97dd6f_2 
  libabseil          conda-forge/linux-64::libabseil-20250512.1-cxx17_hba17884_0 
  libarrow           conda-forge/linux-64::libarrow-21.0.0-h56a6dad_8_cpu 
  libarrow-acero     conda-forge/linux-64::libarrow-acero-21.0.0-h635bf11_8_cpu 
  libarrow-compute   conda-forge/linux-64::libarrow-compute-21.0.0-h8c2c5c3_8_cpu 
  libarrow-dataset   conda-forge/linux-64::libarrow-dataset-21.0.0-h635bf11_8_cpu 
  libarrow-substrait conda-forge/linux-64::libarrow-substrait-21.0.0-h3f74fd7_8_cpu 
  libbrotlicommon    conda-forge/linux-64::libbrotlicommon-1.1.0-hb03c661_4 
  libbrotlidec       conda-forge/linux-64::libbrotlidec-1.1.0-hb03c661_4 
  libbrotlienc       conda-forge/linux-64::libbrotlienc-1.1.0-hb03c661_4 
  libcrc32c          conda-forge/linux-64::libcrc32c-1.1.2-h9c3ff4c_0 
  libcurl            conda-forge/linux-64::libcurl-8.14.1-h332b0f4_0 
  libedit            conda-forge/linux-64::libedit-3.1.20250104-pl5321h7949ede_0 
  libev              conda-forge/linux-64::libev-4.33-hd590300_2 
  libevent           conda-forge/linux-64::libevent-2.1.12-hf998b51_1 
  libexpat           conda-forge/linux-64::libexpat-2.7.1-hecca717_0 
  libffi             conda-forge/linux-64::libffi-3.4.6-h2dba641_1 
  libgcc             conda-forge/linux-64::libgcc-15.2.0-h767d61c_7 
  libgcc-ng          conda-forge/linux-64::libgcc-ng-15.2.0-h69a702a_7 
  libgomp            conda-forge/linux-64::libgomp-15.2.0-h767d61c_7 
  libgoogle-cloud    conda-forge/linux-64::libgoogle-cloud-2.39.0-hdb79228_0 
  libgoogle-cloud-s~ conda-forge/linux-64::libgoogle-cloud-storage-2.39.0-hdbdcf42_0 
  libgrpc            conda-forge/linux-64::libgrpc-1.73.1-h1e535eb_0 
  libiconv           conda-forge/linux-64::libiconv-1.18-h3b78370_2 
  liblzma            conda-forge/linux-64::liblzma-5.8.1-hb9d3cd8_2 
  libnghttp2         conda-forge/linux-64::libnghttp2-1.67.0-had1ee68_0 
  libnsl             conda-forge/linux-64::libnsl-2.0.1-hb9d3cd8_1 
  libopentelemetry-~ conda-forge/linux-64::libopentelemetry-cpp-1.21.0-hb9b0907_1 
  libopentelemetry-~ conda-forge/linux-64::libopentelemetry-cpp-headers-1.21.0-ha770c72_1 
  libparquet         conda-forge/linux-64::libparquet-21.0.0-h790f06f_8_cpu 
  libprotobuf        conda-forge/linux-64::libprotobuf-6.31.1-h9ef548d_1 
  libre2-11          conda-forge/linux-64::libre2-11-2025.08.12-h7b12aa8_1 
  libsqlite          conda-forge/linux-64::libsqlite-3.50.4-h0c1763c_0 
  libssh2            conda-forge/linux-64::libssh2-1.11.1-hcf80075_0 
  libstdcxx          conda-forge/linux-64::libstdcxx-15.2.0-h8f9b012_7 
  libstdcxx-ng       conda-forge/linux-64::libstdcxx-ng-15.2.0-h4852527_7 
  libthrift          conda-forge/linux-64::libthrift-0.22.0-h454ac66_1 
  libutf8proc        conda-forge/linux-64::libutf8proc-2.11.0-hb04c3b8_0 
  libuuid            conda-forge/linux-64::libuuid-2.41.2-he9a06e4_0 
  libxcrypt          conda-forge/linux-64::libxcrypt-4.4.36-hd590300_1 
  libxml2            conda-forge/linux-64::libxml2-2.15.0-h26afc86_1 
  libxml2-16         conda-forge/linux-64::libxml2-16-2.15.0-ha9997c6_1 
  libzlib            conda-forge/linux-64::libzlib-1.3.1-hb9d3cd8_2 
  lz4-c              conda-forge/linux-64::lz4-c-1.10.0-h5888daf_1 
  ncurses            conda-forge/linux-64::ncurses-6.5-h2d0b736_3 
  nlohmann_json      conda-forge/linux-64::nlohmann_json-3.12.0-h54a6638_1 
  openssl            conda-forge/linux-64::openssl-3.5.4-h26f9b46_0 
  orc                conda-forge/linux-64::orc-2.2.1-hd747db4_0 
  pip                conda-forge/noarch::pip-25.2-pyh8b19718_0 
  prometheus-cpp     conda-forge/linux-64::prometheus-cpp-1.3.0-ha5d0236_0 
  pyarrow            conda-forge/linux-64::pyarrow-21.0.0-py310hff52083_1 
  pyarrow-core       conda-forge/linux-64::pyarrow-core-21.0.0-py310h923f568_1_cpu 
  python             conda-forge/linux-64::python-3.10.18-hd6af730_0_cpython 
  python_abi         conda-forge/noarch::python_abi-3.10-8_cp310 
  re2                conda-forge/linux-64::re2-2025.08.12-h5301d42_1 
  readline           conda-forge/linux-64::readline-8.2-h8c095d6_2 
  s2n                conda-forge/linux-64::s2n-1.5.26-h5ac9029_0 
  setuptools         conda-forge/noarch::setuptools-80.9.0-pyhff2d567_0 
  snappy             conda-forge/linux-64::snappy-1.2.2-h03e3b7b_0 
  tk                 conda-forge/linux-64::tk-8.6.13-noxft_hd72426e_102 
  tzdata             conda-forge/noarch::tzdata-2025b-h78e105d_0 
  wheel              conda-forge/noarch::wheel-0.45.1-pyhd8ed1ab_1 
  zlib               conda-forge/linux-64::zlib-1.3.1-hb9d3cd8_2 
  zstd               conda-forge/linux-64::zstd-1.5.7-hb8e6e7a_2 


Proceed ([y]/n)? y 


Downloading and Extracting Packages:
                                                                                                                                                                                                                                                                                                                                                                                       
Preparing transaction: done                                                                                                                                                                                                                                                                                                                                                            
Verifying transaction: done                                                                                                                                                                                                                                                                                                                                                            
Executing transaction: done                                                                                                                                                                                                                                                                                                                                                            
#                                                                                                                                                                                                                                                                                                                                                                                      
# To activate this environment, use                                                                                                                                                                                                                                                                                                                                                    
#                                                                                                                                                                                                                                                                                                                                                                                      
#     $ conda activate pyarrow_conda_from_docker                                                                                                                                                                                                                                                                                                                                       
#                                                                                                                                                                                                                                                                                                                                                                                      
# To deactivate an active environment, use                                                                                                                                                                                                                                                                                                                                             
#                                                                                                                                                                                                                                                                                                                                                                                      
#     $ conda deactivate                                                                                                                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                                                                                                                       
(base) bash-4.4# conda activate pyarrow_conda_from_docker                                                                                                                                                                                                                                                                                                                              
(pyarrow_conda_from_docker) bash-4.4# conda-pack -n pyarrow_conda_from_docker --exclude */__pycache__/* -o /conda_env/indocker/pyarrow_conda_from_docker.tar.gz                                                                                                                                                                                                                        
/root/miniconda3/envs/pyarrow_conda_from_docker/lib/python3.10/site-packages/conda_pack/core.py:16: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.                                  
  import pkg_resources                                                                                                                                                                                                                                                                                                                                                                 
Collecting packages...                                                                                                                                                                                                                                                                                                                                                                 
Packing environment at '/root/miniconda3/envs/pyarrow_conda_from_docker' to '/conda_env/indocker/pyarrow_conda_from_docker.tar.gz'                                                                                                                                                                                                                                                     
[########################################] | 100% Completed |  7.4s

Then, when I unpack this

tkopczynski@dotdata ~/t/c/indocker> tar zxf pyarrow_conda_from_docker.tar.gz -C pyarrow_conda/

I try to check my S3 connectivity:

bash-4.4# /conda_env/indocker/pyarrow_conda/bin/python /conda_env/main.py 
DefaultVerifyPaths(cafile=None, capath=None, openssl_cafile_env='SSL_CERT_FILE', openssl_cafile='/home/conda/feedstock_root/build_artifacts/openssl_split_1759323449041/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/ssl/cert.pem', openssl_capath_env='SSL_CERT_DIR', openssl_capath='/home/conda/feedstock_root/build_artifacts/openssl_split_1759323449041/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/ssl/certs')
Traceback (most recent call last):
  File "/conda_env/main.py", line 5, in <module>
    print(S3FileSystem.from_uri("s3://s3-bucket/"))
  File "pyarrow/_fs.pyx", line 502, in pyarrow._fs.FileSystem.from_uri
  File "pyarrow/_fs.pyx", line 457, in pyarrow._fs.FileSystem._native_from_uri
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
OSError: When resolving region for bucket 's3-bucket': AWS Error NETWORK_CONNECTION during HeadBucket operation: curlCode: 77, Problem with the SSL CA cert (path? access rights?); Details: error setting certificate file: /home/conda/feedstock_root/build_artifacts/curl_split_recipe_1749032811691/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p

It fails with an SSL error.

However, it works when I do conda-unpack:

(pyarrow_conda) bash-4.4# conda-unpack
(pyarrow_conda) bash-4.4# /conda_env/indocker/pyarrow_conda/bin/python /conda_env/main.py
DefaultVerifyPaths(cafile='/conda_env/indocker/pyarrow_conda/ssl/cert.pem', capath='/conda_env/indocker/pyarrow_conda/ssl/certs', openssl_cafile_env='SSL_CERT_FILE', openssl_cafile='/conda_env/indocker/pyarrow_conda/ssl/cert.pem', openssl_capath_env='SSL_CERT_DIR', openssl_capath='/conda_env/indocker/pyarrow_conda/ssl/certs')
(<pyarrow._s3fs.S3FileSystem object at 0x767fc08f90f0>, 's3-bucket')

Unfortunately, in the Spark environment it's impossible to run conda-unpack right now which was discussed in #89. Also, conda-unpack is not mentioned in the official doc: https://conda.github.io/conda-pack/spark.html.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    type::bugdescribes erroneous operation, use severity::* to classify the type

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions