Skip to content

Conversation

@davidm-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR aims to re-enable SQL Scripting in Spark 4.1, ahead of its RC.

Why are the changes needed?

There was a previous misunderstanding/miscommunication, that I did not notice in time. The feature was disabled because it was thought that it is not ready, due to unclosed JIRA items.

However, it was just a case of not 100% up-to-date items. I have cleaned up and classified remaining work items (children of SPARK-48338) into 4 categories:

  • M0 - basic support
  • M1 - features and changes required to enable SQL Scripting by default
  • M2 - follow-up improvements and additional functionalities that are non-fundamental and should not block M1
  • M3 - potential improvements for the future, need investigation

M0 and M1 are done, meaning the feature is stable, useful and also ready to be used.

M2 will improve some of the aspects of using it by introducing a newer and more user-friendly statements, like SIGNAL/RESIGNAL and GET DIAGNOSTICS, as well as do some very minor optimizations. However, without those, feature is still ready to be used as is.

The only missing aspect is documentation, for which we have a PR (#50592) but it was closed. I'll work on re-submitting that one.

Does this PR introduce any user-facing change?

I am not sure 100% how this classifies - the feature was already present in 4.0, but was not enabled by default.
Anyways, the feature is completely orthogonal to all other standalone statement execution paths, so it is just adding to functionality of Spark.
There is no difference in behavior for any standalone SQL statement compared to when SQL Scripting is turned off.

How was this patch tested?

SQL Scripting has a thorough support in tests.
CI ensures the feature does not affect any other.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Nov 21, 2025
@davidm-db
Copy link
Contributor Author

@cloud-fan
Copy link
Contributor

User can use scripting already by enabling the config, so we need documentation anyway. I don't think this should block 4.1 RC, but if we can finish the documentation before the vote passes, I think re-enabling it is a reasonable choice.

@davidm-db
Copy link
Contributor Author

but if we can finish the documentation before the vote passes, I think re-enabling it is a reasonable choice

@cloud-fan I just created a PR, based on Serge's previous proposal: #53155

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for polishing the feature with the documentation and proposing this, @davidm-db .

As a release manager, I'll block this until I re-verify this again. For now, please consider this as -1 because there is not enough time for the community to re-evaluate your suggestion yet.

Please note that I agree with @cloud-fan 's comment mostly and I don't think this is a blocker for Apache Spark 4.1.0 itself. We can put this and the documentation as a part of Apache Spark 4.2.0 also.

@dongjoon-hyun
Copy link
Member

In addition, please use a different JIRA ID instead of SPARK-54261.

@dongjoon-hyun
Copy link
Member

User can use scripting already by enabling the config, so we need documentation anyway.

Just for the record, I reviewed and merged the documentation PR to the master branch for Apache Spark 4.2.0. For branch-4.1, I tried to deliver the documentation but there is a conflict with branch-4.1. So, the documentation is not in branch-4.1 and there is no documentation PR to fit branch-4.1 neither so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants