Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IS_ASCII support #4557

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

IS_ASCII support #4557

wants to merge 3 commits into from

Conversation

pruzko
Copy link
Contributor

@pruzko pruzko commented Jan 2, 2025

Support for IS_ASCII for a few engines. Issue #4233

@VaggelisD
Copy link
Collaborator

Hey @pruzko, thank you for the PR! As with the other ones, please share any documentation that validates your design choices, it's not straightforward why IS_ASCII is transpiled as such to the other dialects.

@pruzko
Copy link
Contributor Author

pruzko commented Jan 6, 2025

First thing first, IS_ASCII is not supported by any engine natively AFAIK. Either way, people are looking for it (e.g., here, here, or here) and come up with some (more) hideous solutions. There was a brief discussion on slack about whether this is too exotic for to have in sqlglot, but I gave it a go.

There are no docs because the are tricks of my own 🧙‍♂️.

SQLITE:

x GLOB '*[\x00-\x7f]*' # look for any ASCII char
x GLOB '*[^\x00-\x7f]*' # look for any non-ASCII char
NOT x GLOB '*[^\x00-\x7f]*' # no non-ASCII chars were found, i.e., all chars are ASCII
NOT x GLOB '*[^\x01-\x7f]*' # the NULL byte is a string terminator in SQLite, so we can safely remove it
NOT x GLOB CAST(x'2a5b5e012d7f5d2a' as TEXT) # SQLite does not support hex-encoded char literals, so we have to convert the string at runtime

MySQL:

REGEXP_LIKE(x, '^[[:ascii:]]*$') # all chars are ASCII

Postgres:

x ~ '^[[:ascii:]]*$' # all chars are ASCII

TSQL:

PATINDEX('%[\x00-\x7f]%', x) # look for any ASCII char
PATINDEX('%[^\x00-\x7f]%', x) # look for any non-ASCII char
PATINDEX('%[^\x00-\x7f]%', x) # no non-ASCII chars were found, i.e., all chars are ASCII
PATINDEX('%[^'  +char(0)  '-'  +char(0x7f)  ']%', x) = 0 # TSQL does not support hex-encoded char literals, so we have to convert them at runtime
PATINDEX('%[^'  +char(0)  '-'  +char(0x7f)  ']%' COLLATE Latin1_General_BIN, x) = 0 # the pattern has non-printable chars in it, so we have to change the encoding to Latin1_General_BIN

Oracle:

REGEXP_LIKE(x , '^[\x01-\x7f]*$') # all chars are ASCII, also the NULL byte is a string terminator so it's removed
REGEXP_LIKE(x , '^[' || chr(0x1) || '-' || chr (127) || ']*$') # hex-encoded string literals not supported, so we resolve at runtime
NVL(REGEXP_LIKE(x , '^[' || chr(0x1) || '-' || chr (127) || ']*$'), TRUE) # assure TRUE on NULL just like the other engines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants