Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Take a closer look at JsonTuple #10405

Open
revans2 opened this issue Feb 9, 2024 · 5 comments · May be fixed by #12248
Open

[FEA] Take a closer look at JsonTuple #10405

revans2 opened this issue Feb 9, 2024 · 5 comments · May be fixed by #12248
Labels
task Work required that improves the product but is not user facing test Only impacts tests

Comments

@revans2
Copy link
Collaborator

revans2 commented Feb 9, 2024

Is your feature request related to a problem? Please describe.
We recently disabled get_json_object and from_json by default because of errors that can crop up when using them. But we didn't do the same kind of thing for json_tuple. Our json_tuple is implemented in terms of get_json_object. It is not perfect, but it probably has the same issues with escaping that get_json_object also has, which caused it to be disabled by default. We should spend some more time testing it and see if we need to also disable it by default. At least until the issues with get_json_object are fixed.

@revans2 revans2 added ? - Needs Triage Need team to review and classify task Work required that improves the product but is not user facing labels Feb 9, 2024
@mattahrens
Copy link
Collaborator

  1. Turn off json_tuple support by default given potential data quality issues.
  2. Consider re-implementing json_tuple using from_json and then re-enable by default.

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Feb 13, 2024
@revans2
Copy link
Collaborator Author

revans2 commented Feb 13, 2024

#10420 disables json_tuple by default.

Ideally we should look at pulling in Spark unit tests for json_tuple to make sure that we can match what they are doing. At a minimum we need to look at adding tests around single quotes, white space normalization, number normalization, mixed types, etc.

@revans2 revans2 removed their assignment Feb 14, 2024
@revans2 revans2 assigned revans2 and unassigned revans2 Feb 21, 2024
@jihoonson
Copy link
Collaborator

Hi @revans2, I'm trying to figure out the current state of our JSON support. Do you think we should add any tests in #10491 before we reconsider enabling json_tuple by default?

@jihoonson
Copy link
Collaborator

@revans2 I had a look, and none of those tests seem like a blocker to me to enable json_tuple by default. I'd like to add probably a couple of tests just in case before we enable json_tuple by default. The cases I currently have in mind are:

  • dictionary of ints
  • \r\n whitespace in strings
  • garbage at the end of the input lines

Let me know what you think.

@revans2
Copy link
Collaborator Author

revans2 commented Feb 6, 2025

Sure those tests sound find to me. I think we already have tests for get_json_object that cover them, but having ones explicitly for json_tuple would be good.

This was referenced Feb 18, 2025
jihoonson pushed a commit that referenced this issue Feb 20, 2025
This PR adds tests as specified in
#10405.
Specifically, the cases addressed are: dictionary of ints, \r\n
whitespace in strings, and garbage at the end of input lines (e.g.
'{"a":100} this is valid according to spark').

Signed-off-by: Warrick He <[email protected]>
@sameerz sameerz added the test Only impacts tests label Feb 22, 2025
@warrickhe warrickhe linked a pull request Mar 3, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing test Only impacts tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants