Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix setting rescore as false in on_disk knn_vector query #2399

Merged
merged 4 commits into from
Jan 22, 2025

Conversation

e-emoto
Copy link
Contributor

@e-emoto e-emoto commented Jan 16, 2025

Description

This change adds a flag to RescoreContext for tracking if rescore has been disabled by the KNN query parameters. This flag is set to not skip rescore by default, and is only set to skip rescore if "rescore": false is in the KNN query. UTs were added for the KNNQueryBuilderParser that covers the different cases for rescore. The below steps were also run with a debugging cluster to verify that the rescore was being run/skipped correctly based on the query parameters.

Manual Testing Steps

curl -X PUT "localhost:9200/test_index" -H 'Content-Type: application/json' -d'
{
  "settings" : {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "test_field1": {
        "type": "knn_vector",
        "dimension": 8,
        "space_type": "innerproduct",
        "data_type": "float",
        "mode": "on_disk",
        "compression_level": "16x"
      },"test_field2": {
        "type": "knn_vector",
        "dimension": 4,
        "space_type": "innerproduct",
        "data_type": "float",
        "mode": "on_disk",
        "compression_level": "16x"
      }
    }
  }
}
'

curl -X PUT "localhost:9200/_bulk" -H 'Content-Type: application/json' -d'
{ "index": { "_index": "test_index" } }
{ "test_field1": [1.5, 5.5, 3.5, 2.5, 5.5, 4.5, 7.5, 1.5], "test_field2": [1.5, 5.5, 3.5, 2.5]}
{ "index": { "_index": "test_index" } }
{ "test_field1": [2.5, 3.5, 7.5, 4.5, 1.5, 3.5, 5.5, 6.5], "test_field2": [2.5, 3.5, 7.5, 4.5]}
{ "index": { "_index": "test_index" } }
{ "test_field1": [4.5, 5.5, 1.5, 6.5, 3.5, 4.5, 2.5, 1.5], "test_field2": [4.5, 5.5, 1.5, 6.5]}
{ "index": { "_index": "test_index" } }
{ "test_field1": [1.5, 5.5, 2.5, 1.5, 3.5, 7.5, 5.5, 1.5], "test_field2": [1.5, 5.5, 2.5, 1.5]}
{ "index": { "_index": "test_index" } }
{ "test_field1": [1.5, 5.5, 4.5, 2.5, 3.5, 2.5, 6.5, 7.5], "test_field2": [1.5, 5.5, 4.5, 2.5]}
{ "index": { "_index": "test_index" } }
{ "test_field1": [2.5, 3.5, 5.5, 7.5, 1.5, 4.5, 3.5, 6.5], "test_field2": [2.5, 3.5, 5.5, 7.5]}
'

rescore skipped:

curl -X GET "localhost:9200/test_index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "knn": {
      "test_field1": {
        "vector": [ 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 ],
        "k": 3,
        "rescore": false
      }
    }
  }
}
'

rescore does happen:

curl -X GET "localhost:9200/test_index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "knn": {
      "test_field2": {
        "vector": [ 4.0, 3.0, 2.0, 1.0 ],
        "k": 3
      }
    }
  }
}
'

rescore does happen:

curl -X GET "localhost:9200/test_index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "knn": {
      "test_field1": {
        "vector": [ 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 ],
        "k": 3,
        "rescore": {
            "oversample_factor": 2.5
        }
      }
    }
  }
}
'

Related Issues

Resolves #2360

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @e-emoto. Lets add a test somewhere in https://github.com/opensearch-project/k-NN/tree/main/src/test/java/org/opensearch/knn/integ that confirms that it actually does end up disabling re-scoring. Confirm that it fails before and works after.

These tests would provide a good template: https://github.com/opensearch-project/k-NN/blob/main/src/test/java/org/opensearch/knn/integ/ModeAndCompressionIT.java


// Rescore context to be used when rescoring should be explicitly disabled
public static final RescoreContext EXPLICITLY_DISABLED_RESCORE_CONTEXT = RescoreContext.builder()
.oversampleFactor(DEFAULT_OVERSAMPLE_FACTOR)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need to specify this?

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>
@e-emoto e-emoto force-pushed the on-disk-rescore-false branch from 9487cf4 to 63c778c Compare January 21, 2025 23:52
jmazanec15
jmazanec15 previously approved these changes Jan 22, 2025
@jmazanec15 jmazanec15 dismissed their stale review January 22, 2025 00:15

Failing check - rerunning

@navneet1v navneet1v merged commit 387344e into opensearch-project:main Jan 22, 2025
31 checks passed
@navneet1v navneet1v added backport 2.x Bug Fixes Changes to a system or product designed to handle a programming bug/glitch labels Jan 22, 2025
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 22, 2025
* Fix setting rescore as false in on_disk knn_vector query

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

* Reapply Spotless Java

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

* Add IT for testing rescore enabled and disabled

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

---------

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>
(cherry picked from commit 387344e)
e-emoto added a commit to e-emoto/k-NN that referenced this pull request Jan 22, 2025
…project#2399)

* Fix setting rescore as false in on_disk knn_vector query

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

* Reapply Spotless Java

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

* Add IT for testing rescore enabled and disabled

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

---------

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>
(cherry picked from commit 387344e)
e-emoto added a commit to e-emoto/k-NN that referenced this pull request Jan 23, 2025
…project#2399)

* Fix setting rescore as false in on_disk knn_vector query

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

* Reapply Spotless Java

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

* Add IT for testing rescore enabled and disabled

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

---------

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>
(cherry picked from commit 387344e)
e-emoto added a commit to e-emoto/k-NN that referenced this pull request Jan 23, 2025
…project#2399)

* Fix setting rescore as false in on_disk knn_vector query

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

* Reapply Spotless Java

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

* Add IT for testing rescore enabled and disabled

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>

---------

Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>
(cherry picked from commit 387344e)
Signed-off-by: Ethan Emoto <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Bug Fixes Changes to a system or product designed to handle a programming bug/glitch
Projects
Status: 2.19.0
Development

Successfully merging this pull request may close these issues.

Setting rescore to false as search parameter is noop
4 participants