Skip to content

Avoid null cache writes for query routing mappings#963

Open
panjwanirahul wants to merge 1 commit into
trinodb:mainfrom
panjwanirahul:fix-787-null-externalurl-cache-npe
Open

Avoid null cache writes for query routing mappings#963
panjwanirahul wants to merge 1 commit into
trinodb:mainfrom
panjwanirahul:fix-787-null-externalurl-cache-npe

Conversation

@panjwanirahul
Copy link
Copy Markdown
Member

Guard query-id cache writes in routing manager against null/empty values to prevent runtime failures when query history rows have been removed by retention.

What changed

  • Updated BaseRoutingManager:
    • setBackendForQueryId(...): skip empty queryId; invalidate entry if backend is null/empty.
    • setRoutingGroupForQueryId(...): skip empty queryId; invalidate entry if routing group is null/empty.
    • setExternalUrlForQueryId(...): skip empty queryId; invalidate entry if externalUrl is null.
  • Added regression tests in TestRoutingManagerExternalUrlCache:
    • testNullExternalUrlDoesNotThrowAndFallsBackToQueryHistory
    • testNullQueryIdDoesNotThrowForExternalUrlCacheSet

Why

When query history retention deletes older records, lookup for externalUrl can return null.
The previous code attempted to cache that null value, which caused an NPE in the cache path.
The new behavior treats null/empty mappings as “not cacheable” and preserves normal fallback behavior.

Additional context and related issues

  • Related issue: Unhandled NullPointerException if request is for an unknown query ID #787
  • Behavior impact:
    • Fixes NPE in query routing cache update path.
    • Does not change successful routing behavior for valid mappings.
    • For missing mappings, cache entry is invalidated/skipped and existing fallback logic continues.
  • Local validation:
    • TestRoutingManagerExternalUrlCache passed.
    • Routing-focused suite passed (TestRoutingManagerExternalUrlCache, TestRoutingGroupSelector, TestRoutingRulesManager, TestPathFilter, TestTrinoRequestUser, TestTrinoQueryProperties).

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required, with the following suggested text:

* Fix a NullPointerException in query routing cache updates when retained query history entries are removed and external URL lookup returns null.

@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Mar 15, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@rahulpanjwani17
Copy link
Copy Markdown

@ebyhr Hey, can you please check this pull request and also re run the cla action? i just got the mail that my cla is signed
Thanks!!

@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Mar 30, 2026

@cla-bot check

@cla-bot cla-bot Bot added the cla-signed label Mar 30, 2026
@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Mar 30, 2026

The cla-bot has been summoned, and re-checked this pull request!

@rahulpanjwani17
Copy link
Copy Markdown

@ebyhr thank you so much for re running the cla bot, can you also review this pull request?

@panjwanirahul
Copy link
Copy Markdown
Member Author

@ebyhr Hey!! Please get a review on this? It would be great if you could please check this.

@panjwanirahul
Copy link
Copy Markdown
Member Author

@mosabua hello, can i please get a review on this? Pending since a long time, thank you so much!

@Override
public void setBackendForQueryId(String queryId, String backend)
{
if (Strings.isNullOrEmpty(queryId)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's import isNullOrEmpty directly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public void setBackendForQueryId(String queryId, String backend)
{
if (Strings.isNullOrEmpty(queryId)) {
log.warn("Skipping backend cache update for empty queryId");
Copy link
Copy Markdown
Member

@ebyhr ebyhr Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the log level warning? What is the expected action when Trino Gateway admin sees this message?

Guard query-id cache writes in routing manager against null/empty values to prevent runtime failures when query history rows have been removed by retention.

The PR description indicates this is a normal scenario. The log level should be debug if so.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks for the review, made the changes @ebyhr

@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Apr 13, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot cla-bot Bot removed the cla-signed label Apr 13, 2026
@panjwanirahul panjwanirahul force-pushed the fix-787-null-externalurl-cache-npe branch from 3bfc2be to 3ad9da9 Compare April 13, 2026 19:01
@cla-bot cla-bot Bot added the cla-signed label Apr 13, 2026
@panjwanirahul
Copy link
Copy Markdown
Member Author

@ebyhr can you please check this again?

@panjwanirahul
Copy link
Copy Markdown
Member Author

@mosabua can i please get a review here? Thanks much

@panjwanirahul
Copy link
Copy Markdown
Member Author

@mosabua @ebyhr can you folks please check this or assign someone? Thanks much!

Copy link
Copy Markdown
Member

@vishalya vishalya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the flow is this: The trino-gateway first checks its in-memory cache for the query, then falls back to the query history. But if both caches have expired or don't contain the query id, the router would have to reach out directly to the actual clusters to discover where that query is currently executing. So I'm puzzled—how do we actually end up in a situation where both caches miss?
Could you explain why deleting the history would stop the search for the actual backend?

@panjwanirahul
Copy link
Copy Markdown
Member Author

My understanding of the flow is this: The trino-gateway first checks its in-memory cache for the query, then falls back to the query history. But if both caches have expired or don't contain the query id, the router would have to reach out directly to the actual clusters to discover where that query is currently executing. So I'm puzzled—how do we actually end up in a situation where both caches miss? Could you explain why deleting the history would stop the search for the actual backend?

@vishalya
The discovery logic is actually a bit separate from this fix. This is really just a safety fix for an NPE—caches crash if you try to .put() a null value, which is what happens once history is purged by retention. This PR just adds a guard so the gateway doesn't crash when it hits a null mapping, allowing the normal fallbacks to continue.

@vishalya
Copy link
Copy Markdown
Member

I’m trying to trace the crash path and figure out whether we’re masking the real issue somewhere along the way.

Copy link
Copy Markdown
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash commits into one.

Guard query-id cache writes in routing manager against null/empty values
to prevent runtime failures when query history rows have been removed
by retention.

- Use direct static import for isNullOrEmpty.
- Change log level from warn to debug for empty queryId checks.
- Standardize null/empty handling across all cache setters.
- Add regression tests for null/empty external URL and query ID.

Co-authored-by: Cursor <cursoragent@cursor.com>
@panjwanirahul panjwanirahul force-pushed the fix-787-null-externalurl-cache-npe branch from 7ff4fa9 to 3a874a9 Compare May 5, 2026 18:59
@panjwanirahul
Copy link
Copy Markdown
Member Author

@ebyhr done, please check now

@panjwanirahul
Copy link
Copy Markdown
Member Author

@ebyhr @mosabua please check this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants