Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release 3.0] Planned Breaking Changes for 3.0 in Plugin #3248

Open
6 of 8 tasks
peterzhuamazon opened this issue Jan 16, 2025 · 20 comments
Open
6 of 8 tasks

[Release 3.0] Planned Breaking Changes for 3.0 in Plugin #3248

peterzhuamazon opened this issue Jan 16, 2025 · 20 comments
Assignees

Comments

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jan 16, 2025

[Release 3.0] Planned Breaking Changes for 3.0 in Plugin

As mentioned in this META Issue, we want to track and increase transparency around the breaking changes planned for the 3.0 release across participating plugins.

Please kindly document all planned breaking changes related to your plugin for 3.0 here, by sharing detailed information about the changes, expected impact, and any guidance for users or other teams to adapt to the changes.

If needed, please also prepare PRs on documentation-website changes as part of the 3.0 release process.

Overall, this would help us deliver a smooth and seamless upgrade experience.

Thanks.

Breaking behavior in 3.0+

@penghuo
Copy link
Collaborator

penghuo commented Jan 16, 2025

IMO, OpenSearch DSL format support could be consided deperated in 3.0. The reasons are

  • The response is not structured and does not align with other supported formats, such as JDBC or CSV.
  • Query support is limited; only queries that can be fully translated to OpenSearch DSL are supported.

@dai-chen
Copy link
Collaborator

@penghuo
Copy link
Collaborator

penghuo commented Jan 16, 2025

SQL Plugin Breaking Change Tracker

As part of the OpenSearch 3.0 release plan, we intend to introduce breaking changes. To align with the release timeline, we have defined the following review and decision process:

1. Breaking Change Proposal Collection

We are collecting breaking change proposals. Please add your comments to this thread.
The collection phase will conclude on January 24, 2025.

2. Review and Finalization

A meeting will be scheduled during the last week of January 2025 to review the proposed changes with the maintainers.
The final list of breaking changes will be finalized by January 31, 2025.

Criteria for Breaking Change Candidates

Features that meet any of the following criteria are considered strong candidates for breaking changes:

  • Low Usage or Value: Features that are rarely used or provide minimal benefit to users.
  • Technical Debt: Features that increase system complexity or hinder maintainability.
  • Misalignment with Future Goals: Features that conflict with the product’s long-term vision or updated design principles.

@penghuo penghuo removed the untriaged label Jan 23, 2025
@penghuo
Copy link
Collaborator

penghuo commented Jan 23, 2025

Settings

PPL

  • The AD and KMeans commands could be deprecated. The ML command includes the functionality of both AD and KMeans, making the existing AD and KMeans commands redundant.

Connector

The Prometheus connector and Spark connector are considered experimental features, as they have limited adoption and do not align with our long-term goals. We could consider deprecating them.

@vamsimanohar
Copy link
Member

vamsimanohar commented Jan 27, 2025

In 3.0, we should deprecate the setting to move to PIT from scroll. We can make PIT the default one.
https://github.com/opensearch-project/sql/blob/main/common/src/main/java/org/opensearch/sql/common/setting/Settings.java#L26

@penghuo
Copy link
Collaborator

penghuo commented Jan 28, 2025

Breaking behavior in 3.0+

Reference

@ps48
Copy link
Member

ps48 commented Feb 7, 2025

@penghuo While we were planning for 3.0 breaking changes and support on Dashboards plugins. We saw the SQL team is removing PROMQL support in PPL. While investigating the implications of these changes we saw that metrics page is using this pretty heavily for visualizations support from prometheus. While we don’t have a roadmap for prometheus support on metrics page yet, we also don’t have plan to deprecate this feature either. Here are some examples of prometheus queries:

  1. Get Prometheus datasource
show datasources | where CONNECTOR_TYPE="PROMETHEUS" | fields DATASOURCE_NAME
  1. Get metrics
source = my_prometheus.information_schema.tables
  1. Get labels
describe my_prometheus.go_gc_duration_seconds | fields COLUMN_NAME
  1. Fetch metric data
source = my_prometheus.query_range('avg (go_gc_duration_seconds)', 1738797632, 1738884032, '1h')

Questions:

  1. Is our understanding correct that the removal of PromQL supports for prometheus connector will break metrics page based on the above queries?
  2. If this break’s the metrics page, Is there a work around for these queries to be queried with PPL?
  3. Is it possible to not deprecate this feature until the roadmap is clear for the metrics page?

@y3n11
Copy link

y3n11 commented Feb 13, 2025

Hi OpenSearch team,

I would like to better understand the implications of the removal of the SparkSQL connector. We are currently using OpenSearch along with PySpark in a production application, querying data with the library "opensearch-spark-30_2.12-1.3.0.jar" and spark.read.

Can we still use this approach with OpenSearch 3.0.0, or will we need to make changes?

Thank you in advance! @peterzhuamazon

@peterzhuamazon
Copy link
Member Author

Hi OpenSearch team,

I would like to better understand the implications of the removal of the SparkSQL connector. We are currently using OpenSearch along with PySpark in a production application, querying data with the library "opensearch-spark-30_2.12-1.3.0.jar" and spark.read.

Can we still use this approach with OpenSearch 3.0.0, or will we need to make changes?

Thank you in advance! @peterzhuamazon

That would be a question for @penghuo @noCharger @ps48 .

Thanks.

@penghuo
Copy link
Collaborator

penghuo commented Feb 17, 2025

Hi OpenSearch team,

I would like to better understand the implications of the removal of the SparkSQL connector. We are currently using OpenSearch along with PySpark in a production application, querying data with the library "opensearch-spark-30_2.12-1.3.0.jar" and spark.read.

Can we still use this approach with OpenSearch 3.0.0, or will we need to make changes?

Thank you in advance! @peterzhuamazon

no impact, SparkSQL connector is different with opensearch-spark-30_2.12-1.3.0.jar. The original idea of SparkSQL connector is provide a way to let user write spark sql query in opensearch. for instance. source = spark.sql("select * from tbl")

@penghuo
Copy link
Collaborator

penghuo commented Feb 17, 2025

@penghuo While we were planning for 3.0 breaking changes and support on Dashboards plugins. We saw the SQL team is removing PROMQL support in PPL. While investigating the implications of these changes we saw that metrics page is using this pretty heavily for visualizations support from prometheus. While we don’t have a roadmap for prometheus support on metrics page yet, we also don’t have plan to deprecate this feature either. Here are some examples of prometheus queries:

  1. Get Prometheus datasource
show datasources | where CONNECTOR_TYPE="PROMETHEUS" | fields DATASOURCE_NAME
  1. Get metrics
source = my_prometheus.information_schema.tables
  1. Get labels
describe my_prometheus.go_gc_duration_seconds | fields COLUMN_NAME
  1. Fetch metric data
source = my_prometheus.query_range('avg (go_gc_duration_seconds)', 1738797632, 1738884032, '1h')

Questions:

  1. Is our understanding correct that the removal of PromQL supports for prometheus connector will break metrics page based on the above queries?
  2. If this break’s the metrics page, Is there a work around for these queries to be queried with PPL?
  3. Is it possible to not deprecate this feature until the roadmap is clear for the metrics page?

Make sense. We will pasue deprecated PromQL connector until we get clear understanding.

@parked-toes
Copy link

Hi OpenSearch team,

I'm a bit scared to see native format=json deprecated. We use it to retrieve data from objects in arrays:

Having a document:
"movies":[
{ "title": "A" },
{ "title": "B" },
{ "title": "C" }
]

SELECT movies.title in jdbc format returns only 1st title, whereas in json format I get whole the array with titles, that I parse.

Will be there an alternative after the deprecation?
Note: I'm limited to SQL language for queries.

@LantaoJin
Copy link
Member

The count() aggregate function returns BIGINT in Calcite, it means the behaviour in 3.0.0 will be Long type, instead of Integer.
https://github.com/apache/calcite/blob/48648b391367eeb7910e934cc73dabbe1d7370c0/core/src/main/java/org/apache/calcite/sql/fun/SqlCountAggFunction.java#L87-L89

@LantaoJin
Copy link
Member

LantaoJin commented Mar 8, 2025

Some behaviours between PPL and Databases are different which leads to PPL-on-OpenSearch and PPL-on-Spark are different as well.

In stats command of PPL-on-OpenSearch, the order of result is quite different with other databases.
As an example, in command stats count() by colA, colB:

  1. the sequence of output schema is different.

In PPL v2, the sequence of output schema is count, colA, colB. But in most databases, the sequence of output schema is colA, colB, count.

  1. the output order id different.

In PPL v2, the order of output results is ordered by colA + colB. But in most databases, the order of output is random. User must add add ORDER BY clause after GROUP BY clause to keep the results aligning.

In database/Calcite:

select deptno,job, MAX(SAL) max_sal from emp group by deptno,job;
+--------+-----------+---------+
| DEPTNO |    JOB    | MAX_SAL |
+--------+-----------+---------+
| 20     | CLERK     | 1100.00 |
| 30     | SALESMAN  | 1600.00 |
| 20     | MANAGER   | 2975.00 |
| 30     | MANAGER   | 2850.00 |
| 10     | MANAGER   | 2450.00 |
| 20     | ANALYST   | 3000.00 |
| 10     | PRESIDENT | 5000.00 |
| 30     | CLERK     | 950.00  |
| 10     | CLERK     | 1300.00 |
+--------+-----------+---------+

In PPL v2, the stats-by command will sort by-expression by default.

source = opensearch_dashboards_sample_data_flights | stats count() by DestCountry,OriginWeather
"datarows": [
    [
      7,
      "AE",
      "Clear"
    ],
    [
      3,
      "AE",
      "Cloudy"
    ],
    [
      5,
      "AE",
      "Damaging Wind"
    ],
...
[
      208,
      "CN",
      "Clear"
    ],
    [
      187,
      "CN",
      "Cloudy"
    ],
    [
      79,
      "CN",
      "Damaging Wind"
    ],
...

@penghuo Should we keep alignment with V2 behaviour in new Calcite implementation? This order operator could impact performance.

@LantaoJin
Copy link
Member

#3403

@ps48
Copy link
Member

ps48 commented Mar 11, 2025

@penghuo @seankao-az our CIs for Workbench started failing for 3.0-alpha1 when DSL format was deprecated. I wasn't aware that the deprecated DSL format was actually _plugins/_sql?format=json. Can you please verify that this JSON format is actually DSL? If so, I'll remove the corresponding feature from workbench as well. cc: @vamsimanohar

@penghuo
Copy link
Collaborator

penghuo commented Mar 11, 2025

@penghuo @seankao-az our CIs for Workbench started failing for 3.0-alpha1 when DSL format was deprecated. I wasn't aware that the deprecated DSL format was actually _plugins/_sql?format=json. Can you please verify that this JSON format is actually DSL? If so, I'll remove the corresponding feature from workbench as well. cc: @vamsimanohar

@ps48 Yes, JSON format is DSL format. Could u elberate more on failed workbench feature? if workbench depend on DSL feature, let's remove it.

@ps48
Copy link
Member

ps48 commented Mar 11, 2025

@penghuo Workbench has a capability to download sql query result in json format. This makes the call to SQL in the abovementioned endpoint. Will create a PR to remove this.
Image

@penghuo
Copy link
Collaborator

penghuo commented Mar 11, 2025

2. the output order id different.
In PPL v2, the stats-by command will sort by-expression by default.

@LantaoJin, By default the OpenSearch composite buckets are sorted by their natural ordering, PPL stats implementation keep input dataframe order.

IMO, PPL stats do not need to keep dataframe order.

@LantaoJin
Copy link
Member

  1. the output order id different.
    In PPL v2, the stats-by command will sort by-expression by default.

@LantaoJin, By default the OpenSearch composite buckets are sorted by their natural ordering, PPL stats implementation keep input dataframe order.

IMO, PPL stats do not need to keep dataframe order.

Em, I think let' still keep alignment with V2 behavior because when Agg PushDown enabled, the results from stats-by command are sorted by their natural ordering (the same behavior with composite buckets). So if PPL stats-by doesn't keep order, it would lead to different behaviors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants