Return Iceberg metadata tables in SQLAlchemy reflection #519

metadaddy · 2025-01-15T21:13:52Z

Describe the feature

The Iceberg metadata tables, $partitions, $manifests, $files, etc, should be available via SQLAlchemy reflection, perhaps by specifying a dialect-specific keyword such as trino_include_metadata_tables.

There does not currently appear to be a way to get the server to list the metadata tables for an Iceberg table, so the client might just have to 'know' that they exist.

Describe alternatives you've considered

You could hardcode the list of metadata table names into a client app, but this would be vulnerable to changes in the Iceberg connector implementation that might introduce new metadata tables.

Are you willing to submit PR?

Yes I am willing to submit a PR!

The text was updated successfully, but these errors were encountered:

hashhar · 2025-01-16T16:49:36Z

Sounds reasonable to me. Would also be good to check what the JDBC driver's DatabaseMetaData#getTables does for comparison.

hashhar · 2025-02-24T05:52:52Z

I checked what JDBC/engine does:

trino> CREATE CATALOG tpch USING tpch;
    -> CREATE CATALOG iceberg USING iceberg
    -> WITH (
    ->    "fs.native-s3.enabled" = 'true',
    ->    "hive.metastore.uri" = 'thrift://hadoop-master:9083',
    ->    "iceberg.file-format" = 'PARQUET',
    ->    "s3.aws-access-key" = 'minio-access-key',
    ->    -- note the usage of a secret from Vault here making configuration secure
    ->    "s3.aws-secret-key" = '${VAULT:secret/s3:secret-key}',
    ->    "s3.endpoint" = 'http://minio:9080/',
    ->    "s3.path-style-access" = 'true',
    ->    "s3.region" = 'us-east-1'
    -> );
CREATE CATALOG
CREATE CATALOG

trino> CREATE SCHEMA iceberg.demo WITH (location = 's3://test-bucket/demo');
    -> CREATE TABLE iceberg.demo.customer AS SELECT * FROM tpch.sf1.customer;
    -> SHOW TABLES FROM iceberg.demo;
CREATE SCHEMA
CREATE TABLE: 150000 rows

trino> SELECT * FROM system.jdbc.tables WHERE table_cat = 'iceberg';
 table_cat |    table_schem     |    table_name     | table_type | remarks | type_cat | type_schem | type_name | self_referencing_col_name | ref_generation
-----------+--------------------+-------------------+------------+---------+----------+------------+-----------+---------------------------+----------------
 iceberg   | information_schema | applicable_roles  | TABLE      | NULL    | NULL     | NULL       | NULL      | NULL                      | NULL
 iceberg   | information_schema | tables            | TABLE      | NULL    | NULL     | NULL       | NULL      | NULL                      | NULL
 iceberg   | information_schema | views             | TABLE      | NULL    | NULL     | NULL       | NULL      | NULL                      | NULL
 iceberg   | information_schema | table_privileges  | TABLE      | NULL    | NULL     | NULL       | NULL      | NULL                      | NULL
 iceberg   | information_schema | enabled_roles     | TABLE      | NULL    | NULL     | NULL       | NULL      | NULL                      | NULL
 iceberg   | information_schema | roles             | TABLE      | NULL    | NULL     | NULL       | NULL      | NULL                      | NULL
 iceberg   | information_schema | columns           | TABLE      | NULL    | NULL     | NULL       | NULL      | NULL                      | NULL
 iceberg   | information_schema | schemata          | TABLE      | NULL    | NULL     | NULL       | NULL      | NULL                      | NULL
 iceberg   | schema_discovery   | shallow_discovery | TABLE      | NULL    | NULL     | NULL       | NULL      | NULL                      | NULL
 iceberg   | schema_discovery   | discovery         | TABLE      | NULL    | NULL     | NULL       | NULL      | NULL                      | NULL
 iceberg   | demo               | customer          | TABLE      | NULL    | NULL     | NULL       | NULL      | NULL                      | NULL
(11 rows)

It doesn't return any system tables. So I'm closing this issue for consitency's sake.

Also note that BigQuery's SQLAlchemy dialect as well doesn't return the "meta" tables/columns either.

hashhar added the enhancement New feature or request label Jan 16, 2025

hashhar closed this as completed Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return Iceberg metadata tables in SQLAlchemy reflection #519

Return Iceberg metadata tables in SQLAlchemy reflection #519

metadaddy commented Jan 15, 2025

hashhar commented Jan 16, 2025

hashhar commented Feb 24, 2025

Return Iceberg metadata tables in SQLAlchemy reflection #519

Return Iceberg metadata tables in SQLAlchemy reflection #519

Comments

metadaddy commented Jan 15, 2025

Describe the feature

Describe alternatives you've considered

Are you willing to submit PR?

hashhar commented Jan 16, 2025

hashhar commented Feb 24, 2025