Skip to content

fix: handle scheme empty string#11842

Open
baumgold wants to merge 4 commits intoibis-project:mainfrom
baumgold:patch-1
Open

fix: handle scheme empty string#11842
baumgold wants to merge 4 commits intoibis-project:mainfrom
baumgold:patch-1

Conversation

@baumgold
Copy link

@baumgold baumgold commented Jan 15, 2026

Description of changes

Connecting to a trino database with a user/pass using http_scheme="https" does not work. I believe this is because the do_connect function does not enter the if statement due to the walrus operator returning the scheme as an empty string which implicitly evaluates to False. This PR fixes that issue. Last modified in #9960.

Issues closed

@github-actions github-actions bot added the trino The Trino backend label Jan 15, 2026
@baumgold baumgold force-pushed the patch-1 branch 2 times, most recently from 69d4a94 to 9849a46 Compare January 16, 2026 00:43
@NickCrews
Copy link
Contributor

Thanks @baumgold ! Can you add a test for this?

@baumgold
Copy link
Author

@NickCrews - It doesn't look like the repo currently has any tests specifically for the trino backend (or any specific backend for that matter)

@deepyaman
Copy link
Contributor

deepyaman commented Jan 20, 2026

@NickCrews - It doesn't look like the repo currently has any tests specifically for the trino backend (or any specific backend for that matter)

@baumgold There are backend-specific tests under each backend implementation (in this case, ibis/backends/trino/tests/). You can add a test in https://github.com/ibis-project/ibis/blob/main/ibis/backends/trino/tests/test_client.py.

@github-actions github-actions bot added the tests Issues or PRs related to tests label Jan 21, 2026
@baumgold baumgold force-pushed the patch-1 branch 7 times, most recently from f2afbae to 7608742 Compare January 21, 2026 15:30
@NickCrews NickCrews force-pushed the patch-1 branch 2 times, most recently from 9b06e1f to 976cd7d Compare January 21, 2026 19:46
@NickCrews
Copy link
Contributor

@baumgold I just overhauled the test and implementation. Can you see if that is actually testing the behavior you expect?

@NickCrews NickCrews mentioned this pull request Jan 21, 2026
1 task
@NickCrews
Copy link
Contributor

I think the root cause of the confusion is that the TrinoBackend._from_url() classmethod is converting the host kwarg to be just the hostname part of the url, not the entire url. The trino db api expects the host kwarg to be an entire url, and does scheme inference on it.

@NickCrews
Copy link
Contributor

Honestly, the trino API kinda sucks. It isn't documented well for one thing. Then it prefers the URL for some args, such as http_scheme. This is the opposite semantics of how ibis prefers kwargs over the URL. But then trino doesn't even consider the user from the URL.

So, I don't know if we want to get involved with that can of worms, I think we should maybe just pass through all kwargs directly and do very minimal coercion, and tell all users to refer to the trino docs.

@NickCrews
Copy link
Contributor

There were no tests added in https://github.com/ibis-project/ibis/pull/9960/changes, so I don't actually totally understand what the behavior is with http vs https vs None coming from this comment.

# Do NOT convert to url.hostname, the trino client expects an entire URL
# to do inference on http vs https, port, etc.
# https://github.com/trinodb/trino-python-client/blob/2108c38dea79518ffb74370177df2dc95f1e6d96/trino/dbapi.py#L169
kwargs["host"] = url
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work. I believe the url passed in here is expected to have the form trino://<username>:<password>@<hostname>:<port>/<database> if it's passed from ibis.connect. Most likely the scheme would be trino rather than http or https.

ibis/ibis/backends/__init__.py

Lines 1675 to 1775 in 7cc3686

def connect(resource: Path | str, /, **kwargs: Any) -> BaseBackend:
"""Connect to `resource`, inferring the backend automatically.
The general pattern for `ibis.connect` is
```python
con = ibis.connect("backend://connection-parameters")
```
With many backends that looks like
```python
con = ibis.connect("backend://user:password@host:port/database")
```
See the connection syntax for each backend for details about URL connection
requirements.
Parameters
----------
resource
A URL or path to the resource to be connected to.
kwargs
Backend specific keyword arguments
Examples
--------
Connect to an in-memory DuckDB database:
>>> import ibis
>>> con = ibis.connect("duckdb://")
Connect to an on-disk SQLite database:
>>> con = ibis.connect("sqlite://relative.db")
>>> con = ibis.connect(
... "sqlite:///absolute/path/to/data.db"
... ) # quartodoc: +SKIP # doctest: +SKIP
Connect to a PostgreSQL server:
>>> con = ibis.connect(
... "postgres://user:password@hostname:5432"
... ) # quartodoc: +SKIP # doctest: +SKIP
Connect to BigQuery:
>>> con = ibis.connect(
... "bigquery://my-project/my-dataset"
... ) # quartodoc: +SKIP # doctest: +SKIP
"""
url = resource = str(resource)
if re.match("[A-Za-z]:", url):
# windows path with drive, treat it as a file
url = f"file://{url}"
parsed = urllib.parse.urlparse(url)
scheme = parsed.scheme or "file"
orig_kwargs = kwargs.copy()
kwargs = dict(urllib.parse.parse_qsl(parsed.query))
# convert single parameter lists value to single values
for name, value in kwargs.items():
if len(value) == 1:
kwargs[name] = value[0]
# Merge explicit kwargs with query string, explicit kwargs
# taking precedence
kwargs.update(orig_kwargs)
if scheme == "file":
path = parsed.netloc + parsed.path
if path.endswith(".duckdb"):
return ibis.duckdb.connect(path, **kwargs)
elif path.endswith((".sqlite", ".db")):
return ibis.sqlite.connect(path, **kwargs)
elif path.endswith((".csv", ".csv.gz")):
# Load csv/csv.gz files with duckdb by default
con = ibis.duckdb.connect(**kwargs)
con.read_csv(path)
return con
elif path.endswith(".parquet"):
# Load parquet files with duckdb by default
con = ibis.duckdb.connect(**kwargs)
con.read_parquet(path)
return con
else:
raise ValueError(f"Don't know how to connect to {resource!r}")
# Treat `postgres://` and `postgresql://` the same
scheme = scheme.replace("postgresql", "postgres")
try:
backend = getattr(ibis, scheme)
except AttributeError:
raise ValueError(f"Don't know how to connect to {resource!r}") from None
return backend._from_url(parsed, **kwargs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that may (I'm not totally sure, what about ibis.trino.connect("https://...")?) be true for the scheme, but the trino library also uses the port and other parts of the URL, so we still want to keep the url as-is.

What do you think the solution should be then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at the least, if you can write out the expected behavior in terms of test cases, then that will help.

self,
user: str = "user",
auth: str | None = None,
auth: str | BasicAuthentication | None = None,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should accept any kind of Authentication:

Suggested change
auth: str | BasicAuthentication | None = None,
auth: str | trino.auth.Authentication | None = None,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, thanks!

@baumgold
Copy link
Author

baumgold commented Jan 21, 2026

@NickCrews - Thanks for your assistance and commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests Issues or PRs related to tests trino The Trino backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: trino backend connect incorrectly processes empty scheme

4 participants