Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve crate searching #19

Open
Hirevo opened this issue Nov 19, 2019 · 5 comments
Open

Improve crate searching #19

Hirevo opened this issue Nov 19, 2019 · 5 comments
Labels
C-enhancement Category: Enhancement M-api Module: Programmatic API M-frontend Module: Frontend P-low Priority: Low

Comments

@Hirevo
Copy link
Owner

Hirevo commented Nov 19, 2019

Currently, crate searching, both in the frontend and the programmatic API, has limitations:

  • It only operates on the crates' names.
  • Impossible to search with multiple words:
    "serde json" won't yield "serde_json" as a match
  • It is a strict search (no fuzziness at all).

This issue will serve as the place where improvements to the search mechanism, like addressing these limitations, can be discussed.

@Hirevo Hirevo added C-enhancement Category: Enhancement P-low Priority: Low M-frontend Module: Frontend M-api Module: Programmatic API labels Nov 19, 2019
@danieleades
Copy link
Contributor

see here for a nice description of how crates.io handles this - rust-lang/crates.io#1270

this is using some postgres features, so it might be hard to generalise over all backends. What is the advantage to users of the crate of being able to use multiple backing databases?

@Dalvany
Copy link
Contributor

Dalvany commented Apr 28, 2023

Hello, I used to use Alexandrie as a frontend to a mirror of crates.io. Though with few crates current search might be enough when searching through the whole mirror isn't working. For instance, when searching log, the log crate was in the last page of something like 50 pages. I made some changes to use elasticsearch as search engine to improve search.
I unfortunately lost the sources, but if you want I might come up with a pull request if I have time to work on it.

@Hirevo
Copy link
Owner Author

Hirevo commented May 8, 2023

Hi !

I wholeheartedly agree that the search experience is not great right now, exact matches are not favored and there is no relevancy criteria of any sort taken into account.

Expanding and improving search is something that I'd like to get around to, because I think it would be good to allow things like searching based on crate descriptions, keywords or categories.

Using a system like Elasticsearch would indeed considerably improve the experience but I worry that this would be yet another moving part in an already quite involved deployment process (we already have 3 separate pieces that users need to configure: the registry itself, the git index, and the database).
But it is possible that your experiments with it went fairly smoothly and that my worry isn't well informed.

I'm thinking maybe we can use one of the full text search engines that are implemented in Rust to have it directly integrated into the registry itself (like Meilisearch, or Tantivy).
Maybe we could make it so that it automatically synchronizes itself using the database and the git index, when the registry first starts up, and then is kept up-to-date with each crate publication.

I hope to have a stab at something like that rather soon.

@Dalvany
Copy link
Contributor

Dalvany commented May 8, 2023

I agree with your concern having another system, it might be not worth the trouble to have a fourth external softwate. I chose elastic because I was quite familiar with it and I already had a small cluster.
I don't know about Meilisearch but browsing its documentation it seems that it is not a library to include but another system to deploy, so same pitfall as elastic.
Tantivy, though is a library you could use inside current Alexandrie's code. I think it's the best option here.

@Dalvany
Copy link
Contributor

Dalvany commented May 12, 2023

Hello, @Hirevo ,
I find a patch file that contains the search change I made. I was wrong and didn't use elastic but tantivy. I'll make a pull request so you can see what it looks like.
Here's a few thing though, I didn't handle upload new crate because we didn't need that but I can also look into that.
I also didn't build index at start time but instead I made an HTTP endpoint : with large number of crate like in my use case, it takes several minutes to index everything.

@Dalvany Dalvany mentioned this issue May 14, 2023
4 tasks
@Dalvany Dalvany mentioned this issue Jun 2, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: Enhancement M-api Module: Programmatic API M-frontend Module: Frontend P-low Priority: Low
Projects
None yet
Development

No branches or pull requests

3 participants