Skip to content

SOLR-17319 : Combined Query Feature for Multi Query Execution#3418

Merged
dsmiley merged 69 commits intoapache:mainfrom
ercsonusharma:feat_combined_query
Dec 19, 2025
Merged

SOLR-17319 : Combined Query Feature for Multi Query Execution#3418
dsmiley merged 69 commits intoapache:mainfrom
ercsonusharma:feat_combined_query

Conversation

@ercsonusharma
Copy link
Contributor

@ercsonusharma ercsonusharma commented Jul 4, 2025

https://issues.apache.org/jira/browse/SOLR-17319

Description

This feature aims to execute multiple queries of multiple kinds across multiple shards of a collection and combine their result basis an algorithm (like Reciprocal Rank Fusion). It also help resolve the issues being discussed w.r.t the previous PR, mainly around across shard documents merging. It provides more flexibility in terms of querying extending JSON Query DSL ultimately enabling Hybrid Search in a pure way solving the shortcomings.

Note: This feature is currently unsupported for non-distributed and grouping query.

Solution

  • Extended the QueryComponent to create new CombinedQueryComponent and ResponseBuilder to create new CombinedQueryResponseBuilder supports multiple response builders to hold the state and execute multiple queries.
  • In JSON Query DSL, a parameter is added to identity Combined Query request and basis that it invokes the new CombinedQueryComponent
  • CombinedQueryComponent have multiple response builders assigned for each query. These queries are first executed at the SolrSearchIndexer level and combined them using RRF for now.
  • At Shard level also, the responses for the multiple queries are merged.

Tests

  • Added tests for testing the RRF logic independently.
  • Added tests for across search index and distributed requests.
  • Added tests to assert existing behaviour of search handler's QueryComponent as well as for the newly added CombinedQueryComponent basis the flag in json query DSL.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@ercsonusharma
Copy link
Contributor Author

@alessandrobenedetti @dsmiley, please help review it whenever you can. Thanks!

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really glad to see this work began by acknowledging the existing work and trying to address the pitfalls!

@alessandrobenedetti
Copy link
Contributor

Hi @ercsonusharma , thanks for resurrecting this, didn't have time to dedicate to the feature in the last few months, good to see some movement!

In the next couple of weeks, I should be able to give it a go and review it!

@ercsonusharma ercsonusharma requested a review from atris July 16, 2025 18:45
Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the test; I like it!

I've been neglecting your PR (and of others that are non-trivial), with my attention on critical 10.0 backwards compatibility changes. But I should be getting to this...

* This contains the classes to combine the scores from search index results. Multiple
* implementation of algorithms can be added to support them.
*/
package org.apache.solr.handler.component.combine;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the CombinedQuery* classes would come here too, thus all classes associated with this are together. WDYT?

Copy link
Contributor Author

@ercsonusharma ercsonusharma Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original intention was to group the *Handler and *Component classes within the same package, as done previously.
It will affect this where we might deviate from usual convention of having solr.<ComponentName> & solr.<HandlerName>

@dsmiley
Copy link
Contributor

dsmiley commented Dec 5, 2025

BTW I'm busy tweaking some things and preparing a ~final state that's mergeable.
One thing I'm doing is reducing the forcedDistrib API as it's rather internal.
Also merging from main.
Adding changelog.
Updating javadocs; usually to simplify / remove boilerplate likely AI written stuff

Refactor SearchHandler to be more extensible and have a clearer flow.

Improve BaseDistributedSearchTestCase.query to not set "distrib" on the control client request.
@dsmiley
Copy link
Contributor

dsmiley commented Dec 7, 2025

Please take a look Sonu. I also refactored SearchHandler a bit to have a clearer flow / readability and also extensibility. I'd like to merge the refactoring portions of this PR in one go, then the rest of this here. Like... I could extract all that to be under SOLR-17982.

A TODO I didn't get to tonight is testing the different forceDistrib API with a custom SearchHandler subclass. Both testForceDistrib methods are there but need to be modified. I suspect shards.info may reflect what we want to see.

@ercsonusharma
Copy link
Contributor Author

Thanks for all the changes! So, the change would limit, having a new component only can't enable force-distributed behaviour on the existing SearchHandler.

I'd like to merge the refactoring portions of this PR in one go, then the rest of this here.

Sounds good to me.

I am still not sure if shardinfo reflects on the process method being called.

@ercsonusharma
Copy link
Contributor Author

@dsmiley - Thanks for the PR! It looks good on my end. What do you think the next step should be, considering the other PR has been open for quite a few days now?

@dsmiley
Copy link
Contributor

dsmiley commented Dec 14, 2025

#3931 should merge first (say in ~2 days), then this one will be simpler and it's ready to go. I expect I should merge this one by the end of next week..

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sonu, can you please update this from main? It will simplify this PR a good deal. It should be ready for me to merge Friday, which I'm saying to give anyone else a moment to look before merging.

@dsmiley dsmiley requested a review from atris December 17, 2025 14:33
@ercsonusharma
Copy link
Contributor Author

Sounds good, Thank you @dsmiley - done with update from main. I would appreciate it if you could trigger the workflows in the meantime.

@ercsonusharma
Copy link
Contributor Author

Workflow failures seem likely to be infrastructure-related and a flaky SolrCloud test case. Re-triggering the workflow may help it pass.

@ercsonusharma
Copy link
Contributor Author

ercsonusharma commented Dec 19, 2025

@dsmiley - I see it failed again after re-triggering the same workflows. Luckily, I found a tiny cleanup to commit. Can you help approve the workflow?

After the latest run: all passed except "Solr Tests using Crave" which is struggling to find disk space.

@dsmiley dsmiley merged commit 42cd889 into apache:main Dec 19, 2025
2 checks passed
@ercsonusharma ercsonusharma deleted the feat_combined_query branch December 19, 2025 17:49
dsmiley added a commit that referenced this pull request Dec 20, 2025
New CombinedQuerySearchHandler etc. for implementing hybrid search with reciprocal rank fusion (RRF).

See "JSON Combined Query DSL" in ref guide, and params prefixed with "combiner".

QueryComponent: refactorings to enable a subclass to customize merging shard results.

---------

Co-authored-by: Sonu Sharma <sonu_sharma2@apple.com>
Co-authored-by: Christine Poerschke <cpoerschke@apache.org>
Co-authored-by: David Smiley <dsmiley@apache.org>
dsmiley added a commit that referenced this pull request Dec 20, 2025
New CombinedQuerySearchHandler etc. for implementing hybrid search with reciprocal rank fusion (RRF).

See "JSON Combined Query DSL" in ref guide, and params prefixed with "combiner".

QueryComponent: refactorings to enable a subclass to customize merging shard results.

---------

Co-authored-by: Sonu Sharma <sonu_sharma2@apple.com>
Co-authored-by: Christine Poerschke <cpoerschke@apache.org>
Co-authored-by: David Smiley <dsmiley@apache.org>
(cherry picked from commit 42cd889)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:search client:solrj documentation Improvements or additions to documentation tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants