Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in Reading Iceberg tables in Nessie + Minio using Pyiceberg #1560

Open
heman026 opened this issue Jan 22, 2025 · 6 comments
Open

Issue in Reading Iceberg tables in Nessie + Minio using Pyiceberg #1560

heman026 opened this issue Jan 22, 2025 · 6 comments

Comments

@heman026
Copy link

Question

I am getting "Failed to read table metadata from s3a://iceberg-datalake/test/emp_69182e21-1700-4317-9f75-55fca8d57979/metadata/00002-d7b6a027-3d3d-4a1f-9350-ce019969cc2e.metadata.json" when loading table using PyIceberg Rest Catalog. I am using Nessie catalog and minio for storage.

catalog = load_catalog("rest",
    **{
        "uri": "http://10.55.134.161:19120/iceberg",
  "s3.endpoint": "http://10.55.134.161:9000",
        "warehouse": "warehouse",
        "s3.access-key-id": "minioadmin",
        "s3.secret-access-key": "minioadmin"
    },    )
con = catalog.load_table('test.emp')

Nessie Configuration Used:

java
-Dquarkus.management.port=9090
-Dnessie.version.store.type=JDBC
-Dquarkus.datasource.jdbc.url=jdbc:postgresql://localhost:5432/nessie_db
-Dquarkus.datasource.username=nessie
-Dquarkus.datasource.password=nessie
-Dnessie.catalog.default-warehouse=warehouse
-Dnessie.catalog.warehouses.warehouse.location=s3a://iceberg-datalake
-Dnessie.catalog.service.s3.default-options.endpoint=http://10.55.134.161:9000\
-Dnessie.catalog.service.s3.default-options.path-style-access=true
-Dnessie.catalog.service.s3.default-options.access-key=minioadmin
-Dnessie.catalog.service.s3.default-options.secret-key=minioadmin
-Dnessie.server.authentication.enabled=false
-Dnessie.catalog.service.s3.default-options.region=us-east-1
-jar nessie-quarkus-0.100.2-runner.jar

Error

Exception has occurred: BadRequestError

IllegalArgumentException: java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to read table metadata from s3a://iceberg-datalake/test/emp_69182e21-1700-4317-9f75-55fca8d57979/metadata/00002-d7b6a027-3d3d-4a1f-9350-ce019969cc2e.metadata.json

File "C:\pyiceberg\catalog\rest.py", line 697, in load_table response.raise_for_status() requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://10.55.134.161:19120/iceberg/v1/main%7Cwarehouse/namespaces/test/tables/emp The above exception was the direct cause of the following exception: File "C:\pyiceberg\catalog\rest.py", line 476, in _handle_non_200_response raise exception(response) from exc File "C:\Hemanath\KAI\Iceberg Evaluation\Docker\duck\pyiceberg1\catalog\rest.py", line 699, in load_table self._handle_non_200_response(exc, {404: NoSuchTableError}) File "C:\duck.py", line 55, in con = catalog.load_table('test.emp') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pyiceberg1.exceptions.BadRequestError: IllegalArgumentException: java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to read table metadata from s3a://iceberg-datalake/test/emp_69182e21-1700-4317-9f75-55fca8d57979/metadata/00002-d7b6a027-3d3d-4a1f-9350-ce019969cc2e.metadata.json`

Note: I also disabled s3 request signing in Nessie (-Dnessie.catalog.service.s3.default-options.request-signing-enabled=false), but still getting the same error.

Please help me resolve this. Thanks

@Fokko
Copy link
Contributor

Fokko commented Jan 22, 2025

@heman026 Thanks for raising this issue. I'm not super familiar with Nessie, but I do notice that the warehouse configuration should be an s3 path: s3a://iceberg-datalake/

@HungYangChang
Copy link

HungYangChang commented Jan 27, 2025

Hi @Fokko

I have similar question on reading Iceberg table from nessie server

I set up nessie server locally, and I would like to access the Iceberg table.

Here is the output of response = requests.get("http://localhost:19120/api/v1/config", auth=HTTPBasicAuth("test-nessie", "test-nessie"))

Response JSON: {'defaultBranch': 'main', 'maxSupportedApiVersion': 2}

from pyiceberg.catalog import load_catalog
catalog = load_catalog(
    "nessie",
    uri= "http://localhost:19120/api",
    ref= "main",
    authentication={
        "type": "BASIC",
        "username": "test-nessie",
        "password": "test-nessie"
    }
)

This code cannot work due to 2 validation errors for ConfigResponse
ConfigResponse is expect the format

class ConfigResponse(IcebergBaseModel):
    defaults: Properties = Field()
    overrides: Properties = Field()

However, the output of response is defaultBranch and maxSupportedApiVersion
Any thoughts on how to read local server?

@heman026
Copy link
Author

@HungYangChang Check this #1524

@kevinjqliu
Copy link
Contributor

good catch @heman026, did that resolve your issue?

@HungYangChang
Copy link

HungYangChang commented Jan 28, 2025

It is not working for me so far, I am still trying

My config:

print ("Initializing Nessie client...")
catalog = load_catalog(
    "rest",
    **{
        "uri": "http://10.3.120.105:19120/iceberg",
        "authentication.type": "BASIC",
        "authentication.username": "test-nessie",
        "authentication.password": "test-nessie",
    },
)
print("Set up correctly")

Still same error

pydantic_core._pydantic_core.ValidationError: 2 validation errors for ConfigResponse
defaults
  Field required [type=missing, input_value={'defaultBranch': 'main',...SupportedApiVersion': 2}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
overrides
  Field required [type=missing, input_value={'defaultBranch': 'main',...SupportedApiVersion': 2}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing

I also try http://localhost:19120/iceberg, it doesn't work either

My config:
pyiceberg=0.8.1

Btw, I can confirm nessie server is set up correctly:
Image

under: http://localhost:19120/content/main/demo_0128_v3/names
Image

@kevinjqliu
Copy link
Contributor

@HungYangChang i would recommend looking at nessie documentation on how to connect to an iceberg rest catalog. Pyiceberg accepts the standard iceberg rest catalog configurations https://py.iceberg.apache.org/configuration/#rest-catalog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants