Skip to content

Conversation

@mehradpk
Copy link
Contributor

@mehradpk mehradpk commented May 14, 2025

Description

Introduce support for Azure storage backends including Azure Blob Storage (using the wasb:// scheme) and Azure Data Lake Storage Gen2 (using the abfs:// scheme) in the Hive connector.

Key changes:

  • Added HiveAzureConfigurationInitializer to inject relevant Azure configurations into Hadoop Configuration

  • Introduced HiveAzureConfig to allow catalog-level configuration of Azure properties

  • Updated HdfsConfigurationInitializer and HiveConnectorFactory to delegate Azure-specific config setup

  • Registered configuration initializer in Hive module

Supports shared key and OAuth2-based authentication.

##Motivation and Context
Several enterprise data lake workloads are hosted on Azure storage platforms. This change allows Presto to directly query data from Azure Blob Storage and ADLS Gen2, bringing Azure compatibility in line with other cloud storage systems like Amazon S3 and Google Cloud Storage.

Impact

  • Adds support for Azure cloud storage within the Hive connector

  • Introduces new configuration properties under HiveAzureConfig

No breaking changes to existing Hive catalogs or connectors

Test Plan

Test done via Hive Connector.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

== RELEASE NOTES ==

Hive Connector Changes
* Add support for Azure Blob Storage and Azure Data Lake Storage Gen2 in the Hive connector
* Add support for wasb[s]:// and abfs[s]:// URI schemes
* Add support for shared key and OAuth2 authentication for Azure storage

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label May 14, 2025
@prestodb-ci
Copy link
Contributor

@imjalpreet imported this issue as lakehouse/presto #25107

Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mehradpk Thank you for the PR, I have taken an initial pass on the PR.

Have you carried out some actual tests with Azure?

{
return optional.filter(value -> !value.isEmpty());
}
private static Proxy proxyForHost(HostAndPort address)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing the PR. I have carried out real test with Azure storage account.

The method proxyForHost was unused, so I’ve removed it. It likely came in earlier as part of the ADLS Gen1 support, which has since been removed in this implementation.

Comment on lines 119 to 120
configBinder(binder).bindConfig(HiveAzureConfig.class);
binder.bind(AzureConfigurationInitializer.class).to(HiveAzureConfigurationInitializer.class).in(Scopes.SINGLETON);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we have already created HiveAzureModule, please re-use that. You can bind it in DeltaConnectorFactory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I’ve now removed the separate binding and reused the existing HiveAzureModule by binding it in DeltaConnectorFactory as recommended.

Comment on lines 41 to 67
public HiveAzureConfigurationInitializer(HiveAzureConfig config)
{
this.wasbAccessKey = dropEmpty(config.getWasbAccessKey());
this.wasbStorageAccount = dropEmpty(config.getWasbStorageAccount());
if (wasbAccessKey.isPresent() || wasbStorageAccount.isPresent()) {
checkArgument(
wasbAccessKey.isPresent() && wasbStorageAccount.isPresent(),
"If WASB storage account or access key is set, both must be set");
}

this.abfsAccessKey = dropEmpty(config.getAbfsAccessKey());
this.abfsStorageAccount = dropEmpty(config.getAbfsStorageAccount());
this.abfsOAuthClientEndpoint = dropEmpty(config.getAbfsOAuthClientEndpoint());
this.abfsOAuthClientId = dropEmpty(config.getAbfsOAuthClientId());
this.abfsOAuthClientSecret = dropEmpty(config.getAbfsOAuthClientSecret());
if (abfsOAuthClientEndpoint.isPresent() || abfsOAuthClientSecret.isPresent() || abfsOAuthClientId.isPresent()) {
checkArgument(
abfsOAuthClientEndpoint.isPresent() && abfsOAuthClientId.isPresent() && abfsOAuthClientSecret.isPresent(),
"If any of ABFS OAuth2 Client endpoint, ID, and secret are set, all must be set.");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow the same catalog to have both WASB and ABFS configs? Is it supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the implementation accepts both WASB and ABFS configs if all their required fields are present. However, there's no specific logic that handles what happens if both are configured — it depends on the Hadoop file system behavior. If we want to restrict using both in a single catalog, I can add a validation check to ensure only one type is configured.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to validate what happens if we configure both of them simultaneously? Based on that, we can decide whether or not to restrict this behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a UT for this class

Comment on lines 111 to 113
configBinder(binder).bindConfig(HiveAzureConfig.class);
binder.bind(GcsConfigurationInitializer.class).to(HiveGcsConfigurationInitializer.class).in(Scopes.SINGLETON);
binder.bind(AzureConfigurationInitializer.class).to(HiveAzureConfigurationInitializer.class).in(Scopes.SINGLETON);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, as we have already created HiveAzureModule, please re-use that. You can bind it in HudiConnectorFactory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines 140 to 142
configBinder(binder).bindConfig(HiveAzureConfig.class);
binder.bind(GcsConfigurationInitializer.class).to(HiveGcsConfigurationInitializer.class).in(Scopes.SINGLETON);
binder.bind(AzureConfigurationInitializer.class).to(HiveAzureConfigurationInitializer.class).in(Scopes.SINGLETON);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we have already created HiveAzureModule, please re-use that. You can bind it in InternalIcebergConnectorFactory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code updated.

@imjalpreet
Copy link
Member

@mehradpk, can you please help rebase this PR on master? Hadoop Upgrade is merged to master, so we can mark this PR as ready for review. Thanks!

@mehradpk mehradpk marked this pull request as ready for review September 22, 2025 04:06
@prestodb-ci prestodb-ci requested review from a team, auden-woolfson and jp-sivaprasad and removed request for a team September 22, 2025 04:06
@mehradpk mehradpk requested a review from imjalpreet September 22, 2025 04:07
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Sep 22, 2025

Reviewer's Guide

Adds Azure Blob Storage (wasb://) and ADLS Gen2 (abfs://) support to the Hive connector by introducing new Azure-specific configuration classes and modules, and integrating them into existing HDFS configuration initializers, connector factories, and tests.

Class diagram for updated HdfsConfigurationInitializer and connector factories

classDiagram
    class HdfsConfigurationInitializer {
        -AzureConfigurationInitializer azureConfigurationInitialize
        +HdfsConfigurationInitializer(..., AzureConfigurationInitializer)
        +updateConfiguration(Configuration)
    }
    HdfsConfigurationInitializer o-- AzureConfigurationInitializer

    class HiveConnectorFactory {
        +create(...)
    }
    HiveConnectorFactory --> HiveAzureModule

    class HiveAzureModule {
        +setup(Binder)
    }
    HiveAzureModule --> HiveAzureConfig
    HiveAzureModule --> HiveAzureConfigurationInitializer

    class IcebergNativeCatalogFactory {
        -AzureConfigurationInitializer azureConfigurationInitialize
        +IcebergNativeCatalogFactory(..., AzureConfigurationInitializer)
        +getHadoopConfiguration()
    }
    IcebergNativeCatalogFactory o-- AzureConfigurationInitializer

    class IcebergNessieCatalogFactory {
        +IcebergNessieCatalogFactory(..., AzureConfigurationInitializer)
    }
    IcebergNessieCatalogFactory o-- AzureConfigurationInitializer

    class IcebergRestCatalogFactory {
        +IcebergRestCatalogFactory(..., AzureConfigurationInitializer)
    }
    IcebergRestCatalogFactory o-- AzureConfigurationInitializer
Loading

File-Level Changes

Change Details Files
Introduce Azure configuration and initialization classes
  • Define HiveAzureConfig for wasb and abfs account, key, and OAuth2 properties
  • Create AzureConfigurationInitializer interface
  • Implement HiveAzureConfigurationInitializer to apply Azure settings to Hadoop Configuration
  • Add HiveAzureModule for Guice binding of Azure config and initializer
HiveAzureConfig.java
AzureConfigurationInitializer.java
HiveAzureConfigurationInitializer.java
HiveAzureModule.java
Extend HDFS and Iceberg initializers to include Azure setup
  • Add azureConfigurationInitialize parameter to HdfsConfigurationInitializer and Iceberg configuration constructors
  • Invoke azureConfigurationInitialize.updateConfiguration in updateConfiguration methods
  • Propagate Azure initializer through IcebergNativeCatalogFactory, IcebergNessieCatalogFactory, IcebergRestCatalogFactory
HdfsConfigurationInitializer.java
IcebergNativeCatalogFactory.java
IcebergNessieCatalogFactory.java
IcebergRestCatalogFactory.java
Register Azure module in connector factories
  • Include HiveAzureModule in DeltaConnectorFactory, HiveConnectorFactory, HudiConnectorFactory, and InternalIcebergConnectorFactory chains
  • Bind AzureConfigurationInitializer through HiveConnectorFactory setup
DeltaConnectorFactory.java
HiveConnectorFactory.java
HudiConnectorFactory.java
InternalIcebergConnectorFactory.java
Update tests to inject Azure initializer
  • Modify HiveTestUtils to instantiate HiveAzureConfigurationInitializer
  • Update Iceberg and Hive file system tests to pass azure initializer where HdfsConfigurationInitializer is constructed
HiveTestUtils.java
TestIcebergSmokeHadoop.java
TestIcebergSmokeOnS3Hadoop.java
TestIcebergSmokeNessie.java
TestIcebergSmokeRest.java
TestIcebergSmokeRestNestedNamespace.java
AbstractTestHiveFileSystemS3.java
S3SelectTestHelper.java
IcebergDistributedTestBase.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `presto-hive/src/main/java/com/facebook/presto/hive/azure/HiveAzureConfigurationInitializer.java:72-77` </location>
<code_context>
+            config.set("fs.abfs.impl", AzureBlobFileSystem.class.getName());
+        }
+
+        if (abfsOAuthClientEndpoint.isPresent() && abfsOAuthClientId.isPresent() && abfsOAuthClientSecret.isPresent()) {
+            config.set(format("fs.azure.account.auth.type.%s.dfs.core.windows.net", abfsStorageAccount.get()), "OAuth");
+            config.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider");
+            config.set(format("fs.azure.account.oauth2.client.endpoint.%s.dfs.core.windows.net", abfsStorageAccount.get()), abfsOAuthClientEndpoint.get());
+            config.set(format("fs.azure.account.oauth2.client.id.%s.dfs.core.windows.net", abfsStorageAccount.get()), abfsOAuthClientId.get());
+            config.set(format("fs.azure.account.oauth2.client.secret.%s.dfs.core.windows.net", abfsStorageAccount.get()), abfsOAuthClientSecret.get());
+        }
+
</code_context>

<issue_to_address>
**issue (bug_risk):** Potential NPE if abfsStorageAccount is not present when setting OAuth config.

Please add an explicit check to ensure abfsStorageAccount is present before calling get(), or include it in the earlier checkArgument for OAuth parameters to prevent a NullPointerException.
</issue_to_address>

### Comment 2
<location> `presto-hive/src/main/java/com/facebook/presto/hive/azure/HiveAzureConfigurationInitializer.java:67-70` </location>
<code_context>
+
+        if (abfsAccessKey.isPresent() && abfsStorageAccount.isPresent()) {
+            config.set(format("fs.azure.account.key.%s.dfs.core.windows.net", abfsStorageAccount.get()), abfsAccessKey.get());
+            config.set("fs.abfs.impl", AzureBlobFileSystem.class.getName());
+        }
+
</code_context>

<issue_to_address>
**suggestion:** Setting 'fs.abfs.impl' may override user configuration.

Check if 'fs.abfs.impl' is already set in the configuration before assigning it, or clearly document that this will override any existing value.

```suggestion
        if (abfsAccessKey.isPresent() && abfsStorageAccount.isPresent()) {
            config.set(format("fs.azure.account.key.%s.dfs.core.windows.net", abfsStorageAccount.get()), abfsAccessKey.get());
            if (config.get("fs.abfs.impl") == null) {
                config.set("fs.abfs.impl", AzureBlobFileSystem.class.getName());
            }
        }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@steveburnett
Copy link
Contributor

Should any documentation be added? Perhaps some examples? https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector/hive.rst

@mehradpk mehradpk changed the title [DO NOT REVIEW] Add support for Azure Blob Storage and ADLS Gen2 in Hive connector Add support for Azure Blob Storage and ADLS Gen2 in Hive connector Sep 23, 2025
@mehradpk
Copy link
Contributor Author

Should any documentation be added? Perhaps some examples? https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector/hive.rst

Yes, documentation should be updated. Since ADLS support is being added, we should document the required configuration properties (e.g., fs.azure.account.key, fs.azure.account.auth.type, etc.), how users can enable it, and any limitations.

I can add a new subsection under the Hive connector.

@mehradpk mehradpk changed the title Add support for Azure Blob Storage and ADLS Gen2 in Hive connector feat: Add support for Azure Blob Storage and ADLS Gen2 in Hive connector Oct 30, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New security issues found

@mehradpk
Copy link
Contributor Author

@imjalpreet All changes addressed. Re-requested review but it may not have triggered properly - could you please review the changes?

@steveburnett Documentation has been added, Could you please review?

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! The content looks good, but there are several formatting errors.

You might find https://github.com/prestodb/presto/tree/master/presto-docs#building-the-documentation and https://github.com/prestodb/presto/tree/master/presto-docs#viewing-the-documentation helpful to run local doc builds and test your documentation format locally.

-------------------

The Hive connector supports Azure Blob Storage (``wasb://``) and Azure Data Lake Storage Gen2 (``abfs://``).
Configure Azure storage access in your catalog properties file (e.g., ``etc/catalog/hive.properties``).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Configure Azure storage access in your catalog properties file (e.g., ``etc/catalog/hive.properties``).
Configure Azure storage access in your catalog properties file, such as in ``etc/catalog/hive.properties``.

Avoid Latin abbreviations. See the GitLab documentation style guide recommended word list entry for e.g. for alternatives. I've suggested one alternative here, please revise it if a different alternative fits better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Configuration Properties
^^^^^^^^^^^^^^^^^^^^^^^^

================================================= =====================================================================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table in lines 737-757 isn't displaying in the local doc build. See screenshot.

Screenshot 2025-10-30 at 10 24 36 AM

Property Name Description
================================================= =====================================================================
``hive.azure.wasb.storage-account`` Azure Blob Storage account name
(without ``.blob.core.windows.net suffix``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suffix should not be within the formatting ticks

``hive.azure.wasb.access-key`` Azure Blob Storage access key

``hive.azure.abfs.storage-account`` ADLS Gen2 storage account name
(without ``.blob.core.windows.net suffix``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suffix should not be within the formatting ticks

* WASB requires both ``wasb.storage-account`` and ``wasb.access-key`` to be set
* ABFS with shared key requires both ``abfs.storage-account`` and ``abfs.access-key`` to be set
* ABFS with OAuth requires ``abfs.storage-account``, ``abfs.oauth.client-endpoint``, ``abfs.oauth.client-id``, and
``abfs.oauth.client-secret`` (all four properties) to be set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 765 should be part of the list item above it. See screenshot for how it currently displays in a local doc build.
Screenshot 2025-10-30 at 10 24 36 AM copy

abfs://<container>@<storage-account>.dfs.core.windows.net/<path>

Usage Examples
^^^^^^^^^^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
^^^^^^^^^^
^^^^^^^^^^^^^^

#####################

If you see ``403 Forbidden`` or ``401 Unauthorized``:
* Verify storage account names are correct (without .blob.core.windows.net or .dfs.core.windows.net suffix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unordered lists in Authentication Errors, Connection Errors, and Path Errors have formatting errors.

Screenshot 2025-10-30 at 10 32 34 AM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed all doc related issues.

Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @mehradpk. I finished another round of review. It mostly looks good now, with some minor suggestions.


configBinder(binder).bindConfig(HiveGcsConfig.class);
binder.bind(GcsConfigurationInitializer.class).to(HiveGcsConfigurationInitializer.class).in(Scopes.SINGLETON);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unrelated, please revert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 42 to 46
if (wasbAccessKey.isPresent() || wasbStorageAccount.isPresent()) {
checkArgument(
wasbAccessKey.isPresent() && wasbStorageAccount.isPresent(),
"If WASB storage account or access key is set, both must be set");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (wasbAccessKey.isPresent() || wasbStorageAccount.isPresent()) {
checkArgument(
wasbAccessKey.isPresent() && wasbStorageAccount.isPresent(),
"If WASB storage account or access key is set, both must be set");
}
checkArgument(
wasbAccessKey.isPresent() == wasbStorageAccount.isPresent(),
"If WASB storage account or access key is set, both must be set");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

this.abfsOAuthClientEndpoint = dropEmpty(config.getAbfsOAuthClientEndpoint());
this.abfsOAuthClientId = dropEmpty(config.getAbfsOAuthClientId());
this.abfsOAuthClientSecret = dropEmpty(config.getAbfsOAuthClientSecret());
if (abfsOAuthClientEndpoint.isPresent() || abfsOAuthClientSecret.isPresent() || abfsOAuthClientId.isPresent()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abfsStorageAccount.isPresent() is missing in the if condition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored this piece of code to handle all scenarios. Previously, including abfsStorageAccount in the OAuth check can result in error if a user intended to use only the access key. Now the code separately validates - access key and OAuth.

Comment on lines 41 to 67
public HiveAzureConfigurationInitializer(HiveAzureConfig config)
{
this.wasbAccessKey = dropEmpty(config.getWasbAccessKey());
this.wasbStorageAccount = dropEmpty(config.getWasbStorageAccount());
if (wasbAccessKey.isPresent() || wasbStorageAccount.isPresent()) {
checkArgument(
wasbAccessKey.isPresent() && wasbStorageAccount.isPresent(),
"If WASB storage account or access key is set, both must be set");
}

this.abfsAccessKey = dropEmpty(config.getAbfsAccessKey());
this.abfsStorageAccount = dropEmpty(config.getAbfsStorageAccount());
this.abfsOAuthClientEndpoint = dropEmpty(config.getAbfsOAuthClientEndpoint());
this.abfsOAuthClientId = dropEmpty(config.getAbfsOAuthClientId());
this.abfsOAuthClientSecret = dropEmpty(config.getAbfsOAuthClientSecret());
if (abfsOAuthClientEndpoint.isPresent() || abfsOAuthClientSecret.isPresent() || abfsOAuthClientId.isPresent()) {
checkArgument(
abfsOAuthClientEndpoint.isPresent() && abfsOAuthClientId.isPresent() && abfsOAuthClientSecret.isPresent(),
"If any of ABFS OAuth2 Client endpoint, ID, and secret are set, all must be set.");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to validate what happens if we configure both of them simultaneously? Based on that, we can decide whether or not to restrict this behavior.

}
}

if (abfsOAuthClientEndpoint.isPresent() && abfsOAuthClientId.isPresent() && abfsOAuthClientSecret.isPresent()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abfsStorageAccount.isPresent() check is missing here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

import static com.facebook.airlift.configuration.ConfigBinder.configBinder;

public class HiveAzureModule
extends AbstractConfigurationAwareModule
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to use AbstractConfigurationAwareModule here, as we are not binding based on the value of any configuration.

You can implement com.google.inject.Module

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

configBinder(binder).bindConfig(MetastoreClientConfig.class);
configBinder(binder).bindConfig(ThriftHiveMetastoreConfig.class);
configBinder(binder).bindConfig(HiveGcsConfig.class);
// configBinder(binder).bindConfig(HiveAzureConfig.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove commented code

configBinder(binder).bindConfig(HiveGcsConfig.class);
// configBinder(binder).bindConfig(HiveAzureConfig.class);
binder.bind(GcsConfigurationInitializer.class).to(HiveGcsConfigurationInitializer.class).in(Scopes.SINGLETON);
// binder.bind(AzureConfigurationInitializer.class).to(HiveAzureConfigurationInitializer.class).in(Scopes.SINGLETON);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove commented code

configBinder(binder).bindConfig(FileMergeCacheConfig.class);
binder.bind(CacheStats.class).in(Scopes.SINGLETON);
configBinder(binder).bindConfig(HiveGcsConfig.class);
// configBinder(binder).bindConfig(HiveAzureConfig.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove commented code

configBinder(binder).bindConfig(HiveGcsConfig.class);
// configBinder(binder).bindConfig(HiveAzureConfig.class);
binder.bind(GcsConfigurationInitializer.class).to(HiveGcsConfigurationInitializer.class).in(Scopes.SINGLETON);
// binder.bind(AzureConfigurationInitializer.class).to(HiveAzureConfigurationInitializer.class).in(Scopes.SINGLETON);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove commented code

Introduce support for Azure storage backends including Azure Blob Storage
(using the wasbs:// scheme) and Azure Data Lake Storage Gen2 (using the abfss://
scheme) in the Hive connector.

Key changes:
- Added HiveAzureConfigurationInitializer to inject relevant Azure configurations
  into Hadoop Configuration
- Introduced HiveAzureConfig to allow catalog-level configuration of Azure properties
- Updated HdfsConfigurationInitializer and HiveConnectorFactory to delegate Azure-specific
  config setup
- Registered configuration initializer in Hive module

Supports shared key and OAuth2-based authentication.
@mehradpk
Copy link
Contributor Author

@imjalpreet #25107 (comment) --> Yes.

After testing the behavior of the HiveAzureConfigurationInitializer with different configurations:

  • ABFS + WASB together: This works perfectly fine. The two are independent, as their configuration properties and endpoints are separate, so having both configured does not cause any issues.

  • ABFS - OAuth + AccessKey together: If both are configured, OAuth takes precedence over AccessKey at runtime. But, there’s no fallback to AccessKey if OAuth authentication credentials are not valid, the operation will fail.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build. Looks good, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants