Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHAR_LENGTH for postgres #4555

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

CHAR_LENGTH for postgres #4555

wants to merge 3 commits into from

Conversation

pruzko
Copy link
Contributor

@pruzko pruzko commented Jan 2, 2025

I've added support for CHAR_LEN[GTH] for postgres. I've also reused the concept of exp.Length(this='x', binary=True) from bigquery/snowflake (though I don't think is ever used in those dialects).

Copy link
Collaborator

@VaggelisD VaggelisD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Could you share concrete examples in which CHAR_LENGTH and LENGTH differ in Postgres? The psql docs specify that LENGTH returns the number of characters as does CHAR_LENGTH, in contrast to e.g MySQL:

mysql> SELECT LENGTH('€'), CHAR_LENGTH('€');
3 1

psql> SELECT LENGTH('€'), LENGTH('€', 'UTF8'), CHAR_LENGTH('€');
1 1 1

@pruzko
Copy link
Contributor Author

pruzko commented Jan 6, 2025

OOPS! You're right, I messed something up in my notes. It is now fixed.

Also, parse_one('SELECT LENGTH(x)', 'mysql').sq('mysql') == 'SELECT LENGTH(x)' as it should and parse_one('SELECT LENGTH(x)', 'sqlite').sql('mysql') == 'SELECT CHAR_LENGTH(x)' as it also should.

Copy link
Collaborator

@VaggelisD VaggelisD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do note that Postgres supports an encoding arg in LENGTH, currently this will be dropped afaict. If you want, you can extend exp.Length to also store that
  2. Could we add a couple Postgres tests as well? They'd look like:
self.validate_identity("LENGTH(x)")
self.validate_identity("LENGTH(x, y)")
self.validate_identity("CHAR_LENGTH(x)", "LENGTH(x)")
self.validate_identity("CHAR_LEN(x)", "LENGTH(x)")

With these we should cover all Postgres roundtrips (the 2nd test requires bullet 1)

@@ -377,6 +377,7 @@ class Parser(parser.Parser):
"GENERATE_SERIES": _build_generate_series,
"JSON_EXTRACT_PATH": build_json_extract_path(exp.JSONExtract),
"JSON_EXTRACT_PATH_TEXT": build_json_extract_path(exp.JSONExtractScalar),
"LEN": lambda args: exp.Length(this=seq_get(args, 0), binary=True),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be removed? Can't find LEN on Postgres's docs, and even if it exists it's probably an alias of LENGTH which is not binary (?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants