A demo of hybrid search done in Elasticsearch. The project was never meant to be published, but over time, I thought some part of it might be useful to someone.
The goal of the project was to allow users to search through a collection of t-shirt photos. The input was just image files, without descriptions, brand names, colors, etc. The images were taken from the Kaggle Fashon dataset, except that they have been downsized to 640px. Dataset, along with some of the original data is available at Hugging face
The idea was to test two options:
- use the BLIP model to generate image descriptions and then through some SBERT models convert them to their vector representations. BLIP generates data only in English, to provide support for multiple languages, choose the paraphrase-multilingual-MiniLM-L12-v2 model,
- use the CLIP model that combines the above approaches, but does it in one step. Both the text and the image are in the same space, so an additional model is not needed.
Lexical search is based on data generated by the BLIP model.
When it comes to hybrid search, Elasticsearch provides an RRF scorer, but unfortunately it is paid, so I used my own naive implementation.
make run
builds and runs the API, UI and Elasticsearch containers,make images
imports a set of images from the Hugging Face dataset,make import
imports the output data from the prepare-data.py script into elasticsearch.
The UI by default is available at http://localhost:8080