Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add database to read_ #9165

Open
1 task done
markdruffel-8451 opened this issue May 9, 2024 · 3 comments
Open
1 task done

feat: Add database to read_ #9165

markdruffel-8451 opened this issue May 9, 2024 · 3 comments
Labels
bigquery The BigQuery backend duckdb The DuckDB backend feature Features or general enhancements mssql The Microsoft SQL Server backend pyspark The Apache PySpark backend snowflake The Snowflake backend

Comments

@markdruffel-8451
Copy link

Is your feature request related to a problem?

The read_ function family allows the user to name a table which writes the file to the default catalog and database.

What is the motivation behind your request?

If I run the code below I get an error, [[TEMP_VIEW_NAME_TOO_MANY_NAME_PARTS](https://docs.microsoft.com/azure/databricks/error-messages/error-classes#temp_view_name_too_many_name_parts)] CREATE TEMPORARY VIEW or the corresponding Dataset APIs only accept single-part view names, but got: comms_media_dev.dart_extensions.test_table. SQLSTATE: 428EK.

import ibis
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
ispark = ibis.pyspark.connect(session = spark)
idf = ispark.read_parquet(source = "abfss://my_parquet", table_name = "comms_media_dev.dart_extensions.test_table")

I can easily resolve this by doing the following:

ispark._session.catalog.setCurrentCatalog("comms_media_dev")
ispark._session.catalog.setCurrentDatabase("dart_extensions")
idf = ispark.read_parquet(source = "abfss:/my_parquet", table_name = "test_table")

This is only a problem because I'm using ibis in a data pipeline and I don't want concurrent nodes to set current catalog and database outside the write operation itself because they might conflict.

Describe the solution you'd like

Ideally read_ functions would have the database parameter, but allowing table_name to accept {catalog}.{database}.{table} would work as well.

What version of ibis are you running?

10.0.0.dev49

What backend(s) are you using, if any?

pyspark

Code of Conduct

  • I agree to follow this project's Code of Conduct
@markdruffel-8451 markdruffel-8451 added the feature Features or general enhancements label May 9, 2024
@gforsyth gforsyth added pyspark The Apache PySpark backend duckdb The DuckDB backend labels May 9, 2024
@gforsyth
Copy link
Member

gforsyth commented May 9, 2024

Hey @markdruffel-8451 -- thanks for raising this! I think this makes a bunch of sense for the backends where we have catalog/database support, and a database kwarg will have a nice symmetry with the rest of the API.

As an interim workaround, you can make use of a private context manager to handle setting and unsetting the catalog and database (note that this is a private API and might break without warning, but hopefully won't break before we add the
database kwarg):

with ispark._active_catalog_database("comms_media_dev", "dart_extensions"):
    idf = ispark.read_parquet(source = "abfss:/my_parquet", table_name = "test_table")

@gforsyth
Copy link
Member

Hey @markdruffel-8451 -- I'm going to keep this open so we can track adding the database kwarg!

@gforsyth gforsyth added bigquery The BigQuery backend snowflake The Snowflake backend mssql The Microsoft SQL Server backend labels May 10, 2024
@gforsyth
Copy link
Member

This also applies for read_csv, and other read_ methods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery The BigQuery backend duckdb The DuckDB backend feature Features or general enhancements mssql The Microsoft SQL Server backend pyspark The Apache PySpark backend snowflake The Snowflake backend
Projects
Status: backlog
Development

No branches or pull requests

2 participants