Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Capture expression name part delimiters #2001

Conversation

mountaincrab
Copy link
Contributor

Context

A lot of databases (e.g. Snowflake, Databricks, Redshift) support querying nested data. For example, a query like this
on Snowflake:

SELECT 
  ("i"."json_data":"rocket_name")::VARCHAR AS "rocket_name" 
FROM 
    "mydatabase"."myschema"."mytable" as "i"

queries the json_data column of the mytable table, extracting the rocket_name field from the JSON data that
exists in the column. Note the colon used between json_data and rocket_name.

We would like to know if a column expression is referring to nested data or not. Examining the delimiters (e.g. .,
:) can help determine this.

Problem

Currently, statements like the above (which contain colons as delimiters) can be parsed (as of #1134). However, the delimiters are not captured, which means they're not available in any of the model objects like Column and Table. It also means if you parse:

"i"."json_data":"rocket_name"

and then deparse it, you get:

"i"."json_data"."rocket_name"

Solution

This PR introduces a change to the grammar so that the delimiters are captured. Furthermore, the model classes have been
updated:

  • Column: new tableDelimiter field - the delimiter used to delimit the column name from the table name
  • Table: new delimiterParts field - the delimiters used to delimit each table name part


import java.util.List;

public class ObjectNames {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is only used by the parser - it's the return value of the RelObjectNames production. I wasn't sure where to put it, as I don't think there are any other similar examples. Is it OK here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put it into the parser grammar then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I'll add it as a static class in the parser

} {
token = RelObjectNameExt() { data.add(token); }
( LOOKAHEAD (2) ("." | ":") ("." { data.add(null); })* token = RelObjectNameExt2() { data.add(token); } ) *
(
LOOKAHEAD (2) ( delimiter = "." | delimiter = ":" ) { delimiters.add(delimiter.image); } ( delimiter = "." { data.add(null); delimiters.add(delimiter.image); })*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add the colon to the optional part here as well, but it wasn't there before I made the change (("." { data.add(null); })*), so just wanted to double check that's OK? If it's OK, the line will read

LOOKAHEAD (2) ( delimiter = "." | delimiter = ":" ) { delimiters.add(delimiter.image); } ((delimiter =  "." | delimiter = ":") { data.add(null); delimiters.add(delimiter.image); })*

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks consistent to me, I would do the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mountaincrab mountaincrab force-pushed the feat/capture-name-part-delimiters branch from 4db1a41 to 8d57096 Compare May 2, 2024 15:10
@manticore-projects
Copy link
Contributor

Thank you for your contribution and interest in JSQLParser.
This is heavy stuff and I do need a few minutes to digest it please.

@manticore-projects
Copy link
Contributor

I am sorry, I am old school and lost on JSON:

In my understanding, this should not be even parsed as a Colum, but as a Key Value Pair:

"i"."json_data":"rocket_name"

At the same time a Column with an Alias is nothing else but a Value Key pair.
So I wonder if it would not make more sense to define a new entity KeyValuePair or JSONColumn for this purpose.

@manticore-projects
Copy link
Contributor

Honestly, this is a tough one:

  1. your solution and implementation is robust and deals with the problem fast

  2. also you did not cause it, because someone else messed up column already (could even have been me)

  3. I do understand that JSON is an important part these days

  4. but I feel like a Database Column should be a Database Column and a MAP/JSON Key Value Pair should be differentiated. At the same time I can't tell you any clear benefit since Column is a Value Key Pair and so is the JSon Value Key pair.

What's the way forward? Do we want to take the "hack" right now, document it with a lot @Todo: separate Key-Value pairs from Columns, watch what happens in real life and take care of it later?

@manticore-projects
Copy link
Contributor

Idea!

Why don't we parse ("i"."json_data":"rocket_name")::VARCHAR as a Column with an Alias, only that Alias has ":" instead of "AS". In my opinion this would be the most flexible and correct solution here. Alias would get a type property with variants like COLUMN_ALIAS and MAP_KEY.

What do you think?

@mountaincrab
Copy link
Contributor Author

Hi @manticore-projects, thanks for the speedy responses! 😁

this should not be even parsed as a Colum, but as a Key Value Pair

Not sure I understand sorry - what would the key be and what would the value be? Note that the bit of the expression that references the json can be any length e.g. mytable.mycolumn:grandparent.parent.child....

So I wonder if it would not make more sense to define a new entity KeyValuePair or JSONColumn for this purpose.

The problem is, we can't actually know if it's a Json expression or not just by looking at the colon, because I think certain databases (e.g. informix? - the one in #1134) use colons for all delimiters - like normal column expressions. So how would we know to parse it into a JsonColumn or a normal Column?

Why don't we parse ("i"."json_data":"rocket_name")::VARCHAR as a Column with an Alias, only that Alias has ":" instead of "AS".

Not sure I understand sorry 🤔 so we'd have a new field on Column, which is an Alias? And the Alias would contain ":rocket_name"? If so, I feel like that's less accurate/a little more confusing than what we currently have.

@manticore-projects
Copy link
Contributor

Not sure I understand sorry - what would the key be and what would the value be? Note that the bit of the expression that references the json can be any length e.g. mytable.mycolumn:grandparent.parent.child....

I guess it is me, who does not understand it correctly.
What does mytable.mycolumn:grandparent.parent.child... mean regarding the part grandparent.parent.child?

I came from the perspective that Map entries like value:key are similar to columns, but your example has obviously a different meaning.

The problem is, we can't actually know if it's a Json expression or not just by looking at the colon, because I think certain databases (e.g. informix? - the one in #1134) use colons for all delimiters - like normal column expressions. So how would we know to parse it into a JsonColumn or a normal Column?

Very strong argument, I totally get your point.

I think we can just go with your improvements and see what will happen.

@mountaincrab
Copy link
Contributor Author

No problem at all, sorry, I probably should have given some example json to illustrate it...

What does mytable.mycolumn:grandparent.parent.child... mean regarding the part grandparent.parent.child?

So say for example you load some json into the mycolumn column of the mytable table. You load 2 rows like:

// row 1
{
  "rocket": {
    "engine": {
      "power": "loads of power"
    }
  }
}
// row 2
{
  "rocket": {
    "engine": {
      "power": "even more power"
    }
  }
}

If you execute

SELECT mycolumn:rocket.engine.power AS rocket_power FROM mytable;

It'll return 2 rows like:

rocket_power
---------------
loads of power
even more power

@manticore-projects
Copy link
Contributor

No problem at all, sorry, I probably should have given some example json to illustrate it...

What does mytable.mycolumn:grandparent.parent.child... mean regarding the part grandparent.parent.child?

So say for example you load some json into the mycolumn column of the mytable table. You load 2 rows like:

// row 1
{
  "rocket": {
    "engine": {
      "power": "loads of power"
    }
  }
}
// row 2
{
  "rocket": {
    "engine": {
      "power": "even more power"
    }
  }
}

If you execute

SELECT mycolumn:rocket.engine.power AS rocket_power FROM mytable;

It'll return 2 rows like:

rocket_power
---------------
loads of power
even more power

Thank you very much for the explanation, very helpful. I have been too stuck in my current maps and key/value pairs.
So basically we are talking more about a path into the content of a column, but not a key.

I think you implementation is good and fixes the problem.
Ready to merge when you correct the Spotless complaints. Run './gradlew :spotlessApply' to fix these violations.

Thank you again for your contribution and time, well done!

@mountaincrab
Copy link
Contributor Author

No worries at all 😁

So basically we are talking more about a path into the content of a column, but not a key.

Yeah exactly - on Snowflake, the bit after the colon is similar to a jsonpath expression 👍

Ready to merge when you correct the Spotless complaints. Run './gradlew :spotlessApply' to fix these violations.

OK great! I've got to head off now but will sort it out first thing tomorrow. Thanks again for your lightning fast review! 😁

@manticore-projects
Copy link
Contributor

Good job @mountaincrab and I would merge immediately when the Q/A test ran though.

@manticore-projects manticore-projects merged commit 0368b9e into JSQLParser:master May 3, 2024
3 checks passed
@mountaincrab
Copy link
Contributor Author

Thanks again for the review @manticore-projects 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants