Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLExecuteQueryOperator with Trino: Query Cancellation Due to Missing Result Fetching #46808

Open
2 tasks done
EdenKik opened this issue Feb 16, 2025 · 1 comment · May be fixed by #46836 or #46997
Open
2 tasks done

SQLExecuteQueryOperator with Trino: Query Cancellation Due to Missing Result Fetching #46808

EdenKik opened this issue Feb 16, 2025 · 1 comment · May be fixed by #46836 or #46997
Labels
area:providers kind:bug This is a clearly a bug kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet provider:trino

Comments

@EdenKik
Copy link

EdenKik commented Feb 16, 2025

Apache Airflow Provider(s)

common-sql

Versions of Apache Airflow Providers

Latest

Apache Airflow version

Latest

Operating System

Linux

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened

When using Airflow's SQLExecuteQueryOperator with Trino, a problem arises when do_xcom_push is set to False. The operator's result processing, which is tied to XComs, is skipped in this case. This violates Trino's client protocol, which requires the client to retrieve the result set before Trino can continue processing.
Because Airflow doesn't fetch the results, Trino cancels the query, returning a USER_CANCELED error, even if the underlying data modification (like an INSERT) was partially successful. Airflow incorrectly reports the task as successful despite the Trino cancellation.

Trino client protocol [https://trino.io/docs/current/client/client-protocol.html#client-protocol]

This issue appears specific to Trino and doesn't affect other databases used with the same operator.
The proposed solution is to create a dedicated TrinoOperator that directly handles Trino's protocol. Instead of relying on XComs, TrinoOperator would fetch all results in default, allowing developers to decide how to use them. Im about to open this PR soon.

Feedback and suggestions on the proposed TrinoOperator solution (or an alternative one) would be greatly appreciated.

What you think should happen instead

The proposed solution is to create a dedicated TrinoOperator that directly handles Trino's protocol.

How to reproduce

A test case confirms this behavior: an INSERT query succeeded in writing data to the Trino table, but the Trino query itself was canceled due to Airflow not retrieving the confirmation response. This resulted in a USER_CANCELED error on the Trino side, even though the data was inserted. The task was still marked as successful in Airflow.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@EdenKik EdenKik added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Feb 16, 2025
Copy link

boring-cyborg bot commented Feb 16, 2025

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers kind:bug This is a clearly a bug kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet provider:trino
Projects
None yet
1 participant