-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable reading WASB and WASBS file paths with ABFS and ABFSS #10127
Comments
I marked this as Snowflake because this is the offending writer, but I think this generally would apply to all engines. |
I'm also interesting in making this contribution to Iceberg if others are okay with this change and can give me some pointers about the best way to ensure this can be applied to every file path. |
Is there a difference in that wasbs is for blob storage, and abfss requires hierarchical storage (ADLSv2) to be enabled? If you try to read a blob storage account using the ADLS SDK, certain things will not work. (iceberg-azure seems to require abfss & ADLSv2. I was actually wishing it'd work with wasbs, using the blob storage APIs, because ADLSv2 doesn't support automatic blob versioning.) |
@ms1111 raises a good point, as there are some known incompatibilities in low-level Blob vs ADLS APIs: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-known-issues However, it looks like the discrepancies may or may not be fundamental to Iceberg/Hadoop use cases depending on what subset of each APIs the different client-side implementations use. It looks like while the legacy Hadoop
And the helper function doesn't really care if the input is
And then some of the methods just delegate through to the "blob client" instead of the "datalake client": https://github.com/Azure/azure-sdk-for-java/blob/0aa45226a625aa19da7183800bb90531eb1f1ee2/sdk/storage/azure-storage-file-datalake/src/main/java/com/azure/storage/file/datalake/DataLakeFileClient.java#L1079
While it'll be important for all service providers to also migrate to no longer producing Are there any Azure experts who can confirm that If it'll be guaranteed to remain drop-in compatible, it seems like one approach could be to include |
A cursory check through https://github.com/apache/iceberg/blob/main/azure/src/main/java/org/apache/iceberg/azure/adlsv2 seems to only reveal three methods used:
And all three appear to just delegate through to the internal It could also be safest to just make @njriasan I'm happy to help review a PR if you wanted to take a stab at this (we'll naturally still need a committer to also review). |
Bringing this back up, it seems like this is a general Iceberg Issue. I think we should just add WASB and WASBS to the resolving FileIO as suggested above. |
Should we reopen this issue since it was reverted and still exists as a problem? |
Feature Request / Improvement
When you setup a managed Snowflake Iceberg table on an Azure account, they will provide locations that use
wasbs://
and notabfss://
.wasb
is currently deprecated by Azure and everyone is encouraged to use ABFS instead. While Snowflake should really change this behavior, in the spirit of allowing people to "update" Iceberg tables without rewriting all the metadata files, it would be great if the iceberg library could handle this automatically.Its my understanding that you can convert a
wasbs
URI to anabfss
URI by just making two changes:wasbs://
withabfss://
blob.core.windows.net
withdfs.core.windows.net
.If this change could be replaced when loading the location and file information from metadata files, then every user could effortlessly update to using
abfs
.Query engine
Snowflake
The text was updated successfully, but these errors were encountered: