Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Publish opensearch-knn jar #1351

Open
erezgong opened this issue Dec 13, 2023 · 18 comments
Open

[FEATURE] Publish opensearch-knn jar #1351

erezgong opened this issue Dec 13, 2023 · 18 comments
Assignees

Comments

@erezgong
Copy link

Is your feature request related to a problem?
Cannot use opensearch-knn as a regular dependency in maven and IntelliJ.

What solution would you like?
Publish jar, like you do in other plugins.

@prudhvigodithi

@prudhvigodithi
Copy link
Member

prudhvigodithi commented Dec 14, 2023

Adding @vamshin @navneet1v to please take look at this issue.

@navneet1v
Copy link
Collaborator

navneet1v commented Dec 14, 2023

@erezgong we do publish the jars. We have another plugin neural search that takes dependency on k-NN plugin. Ref: https://github.com/opensearch-project/neural-search/blob/main/build.gradle#L143

Also publishing actions for gradle: https://github.com/opensearch-project/k-NN/blob/main/build.gradle#L90-L122 which is called here: https://github.com/opensearch-project/k-NN/blob/main/.github/workflows/maven-publish.yml

So not sure what exactly we need with this issue?

@prudhvigodithi

@prudhvigodithi
Copy link
Member

prudhvigodithi commented Dec 14, 2023

Hey @navneet1v, the published type is zip and the gradle logic uses the zip file, unzip it and uses the jar inside the zip file.

Notice for job-scheduler we have both the jar and zip published.

The ask is to do the same for k-NN to publish the jar file as well (Should be under https://aws.oss.sonatype.org/content/repositories/releases/org/opensearch/opensearch-knn/).

@navneet1v
Copy link
Collaborator

@prudhvigodithi
Copy link
Member

@navneet1v
Copy link
Collaborator

@prudhvigodithi on checking further I can see that only job-schedular plugin is publishing the jars. All other plugins are publishing the zips.

In the light of that evidence I don't think so k-NN should publish the jars separately.

@prudhvigodithi
Copy link
Member

@navneet1v no there are more example opensearch-ml-client, opensearch-custom-codecs, opensearch-geo, for more check this link https://aws.oss.sonatype.org/content/repositories/releases/org/opensearch/, Also it depends on the plugin team based on the use case.

@navneet1v
Copy link
Collaborator

navneet1v commented Dec 31, 2023

@prudhvigodithi

Fyi Ml-client is not a plugin. Its more of a library to call the ml apis from plugin to interact with ml commons.

plugins that i dont see is ml commons, sql, alterting, security.

here is my understanding if all the jars are present in the zip then why to publish the jars separately.

There was campaign which was started after which all the plugins started publishing their zips from plugin repo only and not the opensearch ci. The plugins who are still publishing the jars could have been from left over code where we have added the jar publish code so that other plugins can take dependency on.

I don’t see any precedent from opensearch project where plugins needs to publish their jars mandatorily.

Please provide a strong usecase why k-nm should publish the jars too when zip have the same thing in it

@prudhvigodithi
Copy link
Member

prudhvigodithi commented Dec 31, 2023

Thanks @navneet1v, FYI this ask was from @erezgong I'm just trying to emphasize on a point (as I was tagged in the issue) on which its not very hard to publish a jar of k-nn to maven when k-nn acts as an upstream, even neural search uses k-nn as upstream which adds a logic to crack open the zip and use the jar, Refer: https://github.com/opensearch-project/neural-search/blob/main/build.gradle#L161-L165
Screenshot 2023-12-30 at 5 14 33 PM.

@navneet1v also here is the campaign opensearch-project/opensearch-build#1916 to publish the plugin zips to maven which is an enhancement and I dont think jar publish code is the left over code. Also if this the ask from the customer why not? :) I would suggest @erezgong to please add the usecase that would help k-nn team to get more details.

Adding @dblock @bbarani @vamshin

@prudhvigodithi
Copy link
Member

Also running the task ./gradlew publishNebulaPublicationToSnapshotsRepository should publish the jars to maven repo.
Screenshot 2023-12-30 at 5 38 20 PM

Thanks

@jmazanec15
Copy link
Member

To only depend on k-NN jar, see neural example here: https://github.com/opensearch-project/neural-search/blob/main/build.gradle#L253. This should be good enough. Closing

@kannangce
Copy link

kannangce commented Jan 28, 2025

@prudhvigodithi

Thanks a lot for the effort that you put on trying to make this happen.

I need this for our customer, who uses maven and using zip as a dependency for a maven project. Without this jar, we'll have to duplicate KNNQueryBuilder in our code base.

Seems like in the libraries like job-scheduler the artifact is only zip.
With knn,

  1. Will publishing jar along with zip makes sense.
  2. Or, publishing only jar and updating the dependent libraries to make use of jar makes sense.

I'm trying to make this happen. Will you be able to provide me with the pointers. I'll get it done and raise a PR for this.

@jmazanec15
Copy link
Member

Sure thanks @kannangce - reopening

@kannangce
Copy link

Raised this PR. Changes as suggested by @prudhvigodithi .

@jmazanec15
Copy link
Member

@kannangce One concern I have with this is that the jar will not contain the native libraries. So, if they are used, it could break. What does the customer need the dependency for? In general, if its a client side application, we recommend using the Java client. See https://github.com/opensearch-project/opensearch-java/tree/main/samples/src/main/java/org/opensearch/client/samples/knn

@kannangce
Copy link

@jmazanec15 ,

This is client side. So native libraries is not required.

We're forming the dynmaic queries using classes implementing AbstractQueryBuilder. In that context, we would require KNNQueryBuilder and related classes.

Those classes are part of knn-library and not part of the opensearch-java. So using this would be the best option for us, without duplicating the existing code.

@jmazanec15
Copy link
Member

@kannangce but if its client side, why do you need KNNQueryBuilder? AbstractQueryBuilder is used to create the lucene query on the server side. If you just need to build a rest request, https://github.com/opensearch-project/opensearch-java/blob/main/java-client/src/generated/java/org/opensearch/client/opensearch/_types/query_dsl/KnnQuery.java is recommended.

@kannangce
Copy link

The current code base that the client has heavily leverages QueryBuilders, where they create multiple queries based on their DSL, and compose together using builders, which is a easy way to compose multiple builder, for example, QueryBuilders.boolQuery().should(queryBuilder1).should(queryBuilder2)....

So when the need for Knn Query comes in, there are 2 options for them,

  1. Move to KnnQuery(and thereby moving all the implementation using the respective query rather than the builders).
  2. Figure out a way to get KnnQueryBuilder.

Option#1 is costly and requires entire re-write. For Option#2, all it requires is having knn-library as a jar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants