You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently when using the TableProviderFactory mechanism from DataFusion one needs to specify the full exact path to the metadata file as the location, e.g.
create external table inventory
stored as iceberg
location 's3://iceberg/public/inventory/metadata/00001-97ea515a-2d2f-465d-8c74-8daec5ab0023.metadata.json
I think it would be nice if IcebergTableProviderFactory also supported being pointed to a metadata directory as well
create external table inventory
stored as iceberg
location 's3://iceberg/public/inventory/metadata
This would then imply listing and parsing the latest metadata file in that directory (e.g. from V in filenames like <V>-<random-uuid>.metadata.json and maybe the legacy v<V>.metadata.json), as that is likely the overwhelming use case, and using that to build the table. That would improve the flexibility and ergonomics of the integration (e.g. by making quick prototyping much simpler).
The text was updated successfully, but these errors were encountered:
To be clear, i think the DF CREATE EXTERNAL TABLE construct (formalized via TableProviderFactory) is at most loosely coupled to a catalog. In fact the use case is typically just registering a pre-existing table in an ad-hoc manner for some read-only queries. To me this is analogous to FDWs in Postgres.
While it is possible to wire-up the write path for those, it would require implementing TableProvider::insert_into in
(but I also think this is orthogonal to the ask here).
Contrast that to a regular CREATE TABLE construct which would correspond to a full coupling with a catalog, targeting tables native to the given system. Thus a full life-cycle of the table would need to be tracked, but that is also beyond the scope of this issue.
At least that is how we perceive/use them, would be curious to hear other interpretations.
Currently when using the
TableProviderFactory
mechanism from DataFusion one needs to specify the full exact path to the metadata file as the location, e.g.I think it would be nice if
IcebergTableProviderFactory
also supported being pointed to a metadata directory as wellThis would then imply listing and parsing the latest metadata file in that directory (e.g. from
V
in filenames like<V>-<random-uuid>.metadata.json
and maybe the legacyv<V>.metadata.json
), as that is likely the overwhelming use case, and using that to build the table. That would improve the flexibility and ergonomics of the integration (e.g. by making quick prototyping much simpler).The text was updated successfully, but these errors were encountered: