The main motivation behind building rubrix
was to have a visual search engine completely powered by Artificial Intelligence, tying concepts within the fields of Natural Language Processing and Computer Vision, something we like to call "combined similarity search". Currently rubrix
has two main functionalities:
- take in a user input describing an image and retrieve five images that fit that description (image search)
- take in a user uploaded image and retrieve five similar images (reverse-image search)
Please click here to know more details about the architecture and how rubrix
works!
You can check out some of the images retrieved by rubrix
for sample queries here.
This section describes the preqrequisites, and contains instructions, to get the project up and running.
Currently, rubrix
works flawlessly on Linux, and can be set up easily with all the prerequisite packages by following these instructions:
-
Download appropriate version of conda for your machine.
-
Install it by running the
conda_install.sh
file, with the command:$ bash conda_install.sh
-
Add
conda
to bash profile:$ source ~/.bashrc
-
Navigate to
rubrix/
(top-level directory) and create a conda virtual environment with the includedenvironment.yml
file using the following command:$ conda env create -f environment.yml
-
Activate the virtual environment with the following command:
$ conda activate rubrix
-
To install the package with setuptools extras, use the following command in
rubrix/
(top-level directory) containing thesetup.py
file:$ pip install .
Once the prerequisites have been installed, follow these instructions to build the project:
-
Navigate to
rubrix/index
directory. -
Run the bash script
setup.sh
with the following command:$ bash setup.sh
What does this do?
- Downloads flickr8k image/captions dataset.
- Builds and sets up
darknet/
withinrubrix/index
to enable object detection with YOLOv4. - Creates
assets/index.json
file, which essentially is an inverse-image index mapping all the objects YOLOv4 was trained on, to the images containing them. - Creates
assets/imageEmbeddingLocations.json
file, which essentially maps all the images in the database to the sentence embedding vectors generated for each of the captions in the database. - Generates feature vectors describing all the images in the database and save it to
assets/descriptors
directory.
NOTE: The above script can take between 1.5 - 2 hours to complete execution.
- Download data assets from this link.
- Unzip and save the contents in
rubrix/assets
. - All is left is to change the paths in
rubrix/assets/index.json
andrubrix/assets/imageEmbeddingLocations.json
relative to the local machine. This can be done as follows:- Ensure corresponding virtual environment is active, or activate with the following command:
$ conda activate rubrix
- Launch Python Interpretor in the terminal and run the following code snippet:
>>> from rubrix.utils import fix_paths_in_index >>> path_to_index = <absolute/path/to/rubrix/assets/index.json> >>> path_to_emb = <absolute/path/to/rubrix/assets/imageEmbeddingLocations.json> >>> fix_paths_in_index(path_to_index, path_to_emb)
- Ensure corresponding virtual environment is active, or activate with the following command:
- Navigate to
rubrix/rubrix/index
directory and run the following bash script:$ bash quick_setup.sh
With the completion of these steps, you should be able to use rubrix
.
- For image search, execute the
rubrix/query/query_by_text
method. - For reverse image search, execute the
rubrix/query/query_by_image_objects
method.
You can also follow a working example for this here.
An alternative is to use rubrix
as an application on web browser.
- Navigate to
rubrix/rubrix/web
directory. - Enter the following command in the terminal to launch web application:
$ python app.py
This is for if you want to deploy rubrix
on a server e.g. an Ubuntu Linux server on AWS
- Navigate to the top directory
- Enter the following command to build the docker image:
$ sudo docker build -t <YOUR-NAME>/rubrix .
- You can then run:
$ sudo docker run -p 9000:80 <YOUR-NAME>/rubrix
The ideal setup for this would be to have a Apache/Nginx reverse proxy setup on the host system, pointing to port 9000
in this case, and the host system's Apache/Nginx would handle SSL. This would be so you can deploy the application over and over again without worrying about remaking SSL certificates.
The Dockerfile does not use the environment.yml
file because using conda on any sort of production environment is a nightmare. Changes made there will not be reflected in the Dockerized container.
There are no specific guidelines for contributing, apart from a few general guidelines we tried to follow, such as:
- Code should follow PEP8 standards as closely as possible
- We use Google-Style docstrings to document the Python modules in this project.
If you see something that could be improved, send a pull request!
We are always happy to look at improvements, to ensure that rubrix
, as a project, is the best version of itself.
If you think something should be done differently (or is just-plain-broken), please create an issue.
See the LICENSE file for more details.