Skip to content

gilek/vector-search-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Elasticsearch hybrid search demo

Presentation

Screenshot from 2025-02-18 20-17-36

A demo of hybrid search done in Elasticsearch. The project was never meant to be published, but over time, I thought some part of it might be useful to someone.

The goal of the project was to allow users to search through a collection of t-shirt photos. The input was just image files, without descriptions, brand names, colors, etc. The images were taken from the Kaggle Fashon dataset, except that they have been downsized to 640px. Dataset, along with some of the original data is available at Hugging face

The idea was to test two options:

  • use the BLIP model to generate image descriptions and then through some SBERT models convert them to their vector representations. BLIP generates data only in English, to provide support for multiple languages, choose the paraphrase-multilingual-MiniLM-L12-v2 model,
  • use the CLIP model that combines the above approaches, but does it in one step. Both the text and the image are in the same space, so an additional model is not needed.

Lexical search is based on data generated by the BLIP model.

When it comes to hybrid search, Elasticsearch provides an RRF scorer, but unfortunately it is paid, so I used my own naive implementation.

How to set it up

  1. make run builds and runs the API, UI and Elasticsearch containers,
  2. make images imports a set of images from the Hugging Face dataset,
  3. make import imports the output data from the prepare-data.py script into elasticsearch.

The UI by default is available at http://localhost:8080

About

Elasticsearch hybrid search demo

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published