-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] read from multiple s3 regions #1279
Comments
Maybe similar issue for GCS/Azure, since we only cached 1 instance of each FileSystem |
Hey @kevinjqliu, I will happy to work on this task. Thanks |
@danhphan assigned to you! LMK if you have any questions |
Thanks @kevinjqliu , I'm reading the code base. Can you please give me an example of expected unit-tests for the feature if possible? For instance, if we create the follow
I'm thinking may be in the Like:
Thank you. |
Given 2 files in different regions, I want to read them transparently without knowing which region they belong.
Perhaps we also need to think about configuration as well. Setting the |
Yes @kevinjqliu , seems that I still not able to fully understand the requirement for this change. I think I will need more time to read the codes, and may be try some simpler tasks first, to learn more on the codebase. Un-assign me on the task for now if anyone can help me to do it. Thank you. |
Hey @kevinjqliu , can I work on this issue if it is still needs to be worked on? Thanks! |
yes! assigned to you |
Closed by #1453 |
Similar to #1041
Apache Iceberg version
None
Please describe the bug 🐞
Problem
I want to read files from multiple s3 regions. For example, my metadata files are in
us-west-2
but my data files are inus-east-1
. This is not possible currently.Context
Reading a file in
pyarrow
requires alocation
and a file system implementation,fs
. For example,location="s3://blah/foo.parquet"
andfs=S3FileSystem
.iceberg-python/pyiceberg/io/pyarrow.py
Lines 404 to 419 in 0cebec4
The
fs
is used to access the files in s3. And is initialized with the givenS3_REGION
according to the S3 configuration.iceberg-python/pyiceberg/io/pyarrow.py
Lines 347 to 365 in 0cebec4
This means only 1 S3 region is allowed.
Possible Solution
Create multiple instances of
S3FileSystem
, one for each region. And fetch the corresponding instance based onlocation
.pyarrow.fs.resolve_s3_region(bucket)
can determine the correct regionThe text was updated successfully, but these errors were encountered: