Lucene BBQ Integration WIP#3144
Lucene BBQ Integration WIP#3144jack-hung-lgtm wants to merge 3 commits intoopensearch-project:mainfrom
Conversation
|
Will review this PR once we have all integs test successfully run. |
|
2d0fe3a to
ecf7046
Compare
src/main/java/org/opensearch/knn/index/codec/params/KNNBBQVectorsFormatParams.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/BasePerFieldKnnVectorsFormat.java
Outdated
Show resolved
Hide resolved
...services/org.opensearch.transport.grpc.proto.request.search.query.QueryBuilderProtoConverter
Outdated
Show resolved
Hide resolved
|
Please make sure that you pass all the CIs |
src/main/java/org/opensearch/knn/index/codec/KNN9120Codec/KNN9120PerFieldKnnVectorsFormat.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/params/KNNBBQVectorsFormatParams.java
Outdated
Show resolved
Hide resolved
| if (this == x32 && engine == KNNEngine.LUCENE && version.onOrAfter(Version.V_3_6_0)) { | ||
| if (dimension <= RescoreContext.DIMENSION_THRESHOLD) { | ||
| return RescoreContext.builder() | ||
| .oversampleFactor(RescoreContext.OVERSAMPLE_FACTOR_BELOW_DIMENSION_THRESHOLD) |
There was a problem hiding this comment.
I think based on our discussion after reviewing the benchmarking results we want to use a default oversampling factor of 2.0 irrespective of the dimension of the vector.
Also, why are we adding this outside of the above if condition ?
if (modesForRescore.contains(mode)) {
...nsearch/knn/index/codec/backward_codecs/KNN990Codec/KNN990PerFieldKnnVectorsFormatTests.java
Outdated
Show resolved
Hide resolved
...t/java/org/opensearch/knn/index/codec/KNN9120Codec/KNN9120PerFieldKnnVectorsFormatTests.java
Outdated
Show resolved
Hide resolved
|
Temporarily putting as draft - streamlining implementation for the new KNN1040 codec |
1967ab9 to
43ba471
Compare
|
Opening pull request for early review. However, some CIs are still failing so I'll be debugging that in the meantime |
867af09 to
08c4d8b
Compare
...in/java/org/opensearch/knn/index/codec/params/KNN1040ScalarQuantizedVectorsFormatParams.java
Outdated
Show resolved
Hide resolved
|
|
||
| @Override | ||
| public boolean validate(final Map<String, Object> params) { | ||
| return encoderName != null && SUPPORTED_ENCODERS.contains(encoderName); |
There was a problem hiding this comment.
We would also need to validate the bits right? We only support 1-bit currently
There was a problem hiding this comment.
There is a validation for the LuceneBBQEncoder class that ensures the user only provides bits = 1 with BBQ, is that enough? I have a test case that also makes sure it throws exception if the user provides an invalid number of bits for BBQ
qa/restart-upgrade/src/test/java/org/opensearch/knn/bwc/IndexingIT.java
Outdated
Show resolved
Hide resolved
08c4d8b to
2fad3cc
Compare
2fad3cc to
db0a8c8
Compare
db0a8c8 to
4fc9797
Compare
4fc9797 to
84116a0
Compare
Description
Integrates Lucene's BBQ and makes it the default when possible. Building off of this PR from @adityamachiroutu: https://github.com/opensearch-project/k-NN/pull/2838/changes#diff-98b317fe02be43dadfabe498a20936d7a56046c43fef3c9533b39d01e4ab4248
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.