The Sonata Data Ingestor retrieves release data from the MusicBrainz API for a list of artists, processes the data and saves it to a Supabase database and on disk.
Note
This tool is a part of the Sonata project so it retrieves only the data necessary for that project.
Repository | Description |
---|---|
Sonata | Web application that allows you to search for music releases by their cover description. |
Sonata API | API for generating embeddings of user queries for the Sonata project. |
Sonata Data Ingestor | Data ingestor for the Sonata project. |
- A Supabase account and a project set up (or comment out the Supabase-related code in
main.py
).
Module | Description |
---|---|
main.py | Main script that runs the data ingestor. |
musicbrainz/musicbrainz_api.py | MusicBrainzAPI class for interacting with the MusicBrainz API. |
musicbrainz/release_group.py | ReleaseGroup class for interacting with data returned by the MusicBrainzAPI class. |
musicbrainz/extended_release_group.py | ExtendedReleaseGroup class that extends the ReleaseGroup class with additional methods. |
managers/csv_manager.py | CSVManager class for managing the CSV file. |
managers/offset_manager.py | OffsetManager class for managing the offset of processed artists. |
utils/utils.py | Utility functions for all the modules above. |
-
Clone this repository to your local machine:
git clone https://github.com/aivarovsky/sonata-data-ingestor
-
Navigate to the project directory:
cd sonata-data-ingestor
-
Install dependencies:
pip install -r requirements.txt
-
Create a
.env
file with these environment variables:SUPABASE_URL = "YOUR SUPABASE PROJECT URL" SUPABASE_KEY = "YOUR SUPABASE PROJECT API KEY" SUPABASE_TABLE = "YOUR SUPABASE TABLE NAME" SUPABASE_BUCKET = "YOUR SUPABASE BUCKET NAME"
-
Set up constants in
main.py
:ARTISTS_FILE_PATH = "data/artists.txt" CSV_FILE_PATH = "data/db.csv" COVER_ART_DIR_PATH = "data/covers" MODEL_NAME = "clip-ViT-B-32" GENRE_LIST = ["rock", "pop", "r&b", "hip hop"] # For filtering release group genres
-
Run
main.py
:python main.py
This project is licensed under the MIT License.