How to read/write Iceberg table from/to ADLS gen 2 container with PyIceberg

### Question

I would like to read/write the Iceberg table from/to ADLS gen2 with PyIceberg

Background: 
- I know how to use **Spark** to read/write the file, and I has successfully done it locally
- The code I use to upload Iceberg table with nessie server
```
ACCOUNT_NAME = "example-events"  # Storage account name
CONTAINER_NAME = "example-events-iceberg-debug-local"  # Container name
WAREHOUSE_PATH = f"abfs://{CONTAINER_NAME}@{ACCOUNT_NAME}.dfs.core.windows.net/"

spark = SparkSession.builder \
            .config("spark.jars", 'jars/bundle-2.17.178.jar,'
                                  'jars/iceberg-spark-runtime-3.5_2.12-1.5.2.jar,'
                                  'jars/nessie-spark-extensions-3.5_2.12-0.101.3.jar,'
                                  'jars/url-connection-client-2.17.178.jar,'
                                  'jars/postgresql-42.5.0.jar,'
                                  'jars/azure-storage-blob-12.20.0.jar,'
                                  'jars/azure-identity-1.8.1.jar,'
                                  "jars/hadoop-common-3.3.5.jar,jars/hadoop-client-3.3.5.jar,jars/hadoop-azure-3.3.5.jar,"
                                  'jars/hadoop-azure-datalake-3.3.5.jar') \
            .config("spark.hadoop.io.native.lib.available", "false") \
            .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,"
                                            "org.projectnessie.spark.extensions.NessieSparkSessionExtensions") \
            .config("spark.sql.catalog.nessie.uri", "http://localhost:19120/api/v1") \
            .config("spark.sql.catalog.nessie.ref", ref) \
            .config("spark.sql.catalog.nessie.catalog-impl", "org.apache.iceberg.nessie.NessieCatalog") \
            .config("spark.sql.catalog.nessie.warehouse", WAREHOUSE_PATH) \
            .config("spark.sql.catalog.nessie", "org.apache.iceberg.spark.SparkCatalog") \
            .config("spark.driver.extraJavaOptions", f"-Dhadoop.home.dir={hadoop_home}") \
            .config("spark.executor.extraJavaOptions", f"-Dhadoop.home.dir={hadoop_home}") \
            .config("spark.hadoop.fs.azure.account.auth.type.xuievents.dfs.core.windows.net", "OAuth") \
            .config("spark.hadoop.fs.azure.account.oauth.provider.type.xuievents.dfs.core.windows.net",
                    "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider") \
            .config("spark.hadoop.fs.azure.account.oauth2.client.id.xuievents.dfs.core.windows.net", AZURE_CLIENT_ID) \
            .config("spark.hadoop.fs.azure.account.oauth2.client.secret.xuievents.dfs.core.windows.net", AZURE_CLIENT_SECRET) \
            .config("spark.hadoop.fs.azure.account.oauth2.client.endpoint.xuievents.dfs.core.windows.net",
                    f"https://login.microsoftonline.com/{AZURE_TENANT_ID}/oauth2/token") \
            .getOrCreate()

        # Create namespace
        spark.sql(f"CREATE NAMESPACE IF NOT EXISTS nessie.{NAME_SPACE}")
        print(f"Namespace '{NAME_SPACE}' created successfully!")

        # Create a table
        spark.sql(f"CREATE TABLE nessie.{NAME_SPACE}.names (name STRING) USING iceberg")
        print("Table 'names' created successfully!")

        # Insert data
        spark.sql(f"INSERT INTO nessie.{NAME_SPACE}.names VALUES ('Alex Merced'), ('Dipankar Mazumdar'), ('Jason Hughes')")
        print("Data inserted successfully!")

```

I can confirm it has successfully uploaded to local nessie server http://localhost:19120 and data is stored in ADLS gen 2 container

Nessie server:
![Image](https://github.com/user-attachments/assets/b45e7f34-bd0b-409f-96f4-4c15680b77c6)

ADLS gen 2:
![Image](https://github.com/user-attachments/assets/8e639dfe-2d54-41bb-bd49-7d82d9056835)


However, I still cannot figure out how to read/write the Iceberg table from/to ADLS gen2 with PyIceberg

I have reviewed https://py.iceberg.apache.org/#getting-started-with-pyiceberg but still have no luck




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to read/write Iceberg table from/to ADLS gen 2 container with PyIceberg #1588

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to read/write Iceberg table from/to ADLS gen 2 container with PyIceberg #1588

Description

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions