Description
Similar to #1041
Apache Iceberg version
None
Please describe the bug 🐞
Problem
I want to read files from multiple s3 regions. For example, my metadata files are in us-west-2
but my data files are in us-east-1
. This is not possible currently.
Context
Reading a file in pyarrow
requires a location
and a file system implementation, fs
. For example, location="s3://blah/foo.parquet"
and fs=S3FileSystem
.
iceberg-python/pyiceberg/io/pyarrow.py
Lines 404 to 419 in 0cebec4
The fs
is used to access the files in s3. And is initialized with the given S3_REGION
according to the S3 configuration.
iceberg-python/pyiceberg/io/pyarrow.py
Lines 347 to 365 in 0cebec4
This means only 1 S3 region is allowed.
Possible Solution
Create multiple instances of S3FileSystem
, one for each region. And fetch the corresponding instance based on location
. pyarrow.fs.resolve_s3_region(bucket)
can determine the correct region