Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fma keys dataset #625

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

stellaywong
Copy link

This adds support for the FMA Keys dataset, which is a new dataset for the evaluation of key detection containing 340 hours (5489 songs) of song-level key and mode annotations, spread across 17 genres.

Copy link
Collaborator

@genisplaja genisplaja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @stellaywong, sorry for the late late late reply... we were very busy with soundata and its corresponding JOSS submission. Thank you for your contribution to mirdata :) We are targeting a release for next month, and we would like this loader to be included, glad to help on that! The PR is really good, it only needs a bit of refinement, the module docstring, and the documentation (find here the instructions for that!). Let us know if you need any help!

mirdata/datasets/fma_keys.py Outdated Show resolved Hide resolved
mirdata/datasets/fma_keys.py Outdated Show resolved Hide resolved
mirdata/datasets/fma_keys.py Outdated Show resolved Hide resolved
mirdata/datasets/fma_keys.py Outdated Show resolved Hide resolved
mirdata/datasets/fma_keys.py Show resolved Hide resolved
Comment on lines +249 to +280
# librosa has problems reading FMA mp3s without clamping down to the second.
duration = librosa.get_duration(path=fhandle)
return librosa.load(fhandle, sr=None, mono=True, duration=floor(duration))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow is that so? Can we track this error and see why it happens? I'll take a look. Nice workaround for now :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll dig more into it. We didn't see this with other mp3s, just with ones coming from the FMA dataset.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Let me know if yu want us to also take a look at that issue :)

tests/datasets/test_fma_keys.py Outdated Show resolved Hide resolved
tests/datasets/test_fma_keys.py Outdated Show resolved Hide resolved
tests/datasets/test_fma_keys.py Outdated Show resolved Hide resolved
@magdalenafuentes
Copy link
Collaborator

@stellaywong thanks for contributing to mirdata! Just FYI, we're about to make a release for ISMIR in a couple of weeks. Do you have time to address this changes before? It would be great to include this loader in the release! If you don't have the time don't worry, but please let us know

This adds support for the FMA Keys dataset, which is a new dataset
for the evaluation of key detection containing 340 hours (5489 songs)
of song-level key and mode annotations, spread across 17 genres.
@stellaywong
Copy link
Author

Thank you for the reviews! I've made the changes and pushed the updates.

It looks like the PR build is stuck waiting for status to be reported? Do you know how we can kick start the build?

@genisplaja
Copy link
Collaborator

Hey @stellaywong, thanks for the update :) Sorry that I was afk for a few days. OK, that looks quite good, you are missing a few things that I am gonna list in my next review, but the loader looks good! Hope that you will have some time to do the final updates... thanks for the patience!

Copy link
Collaborator

@genisplaja genisplaja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, getting way closer! Thanks again for your patience and collaboration :) I just left few indications of the things your PR is missing. Please let me know if you have any questions. Thanks @stellaywong!

Comment on lines +215 to +217
return jams_utils.jams_converter(
metadata=self._track_metadata,
)
Copy link
Collaborator

@genisplaja genisplaja Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You jams_converter is missing the audio_path, so it cannot compute needed attributes of the corresponding audio. See this exemple:

return jams_utils.jams_converter(
    audio_path=self.audio_path,
    metadata={
        "instrument": self.instrument,
        "genre": self.genre,
        "drum": self.drum,
        "train": self.train,
    },
)

That will be needed for the tests to pass!

Comment on lines +1 to +5
"""
FMA Keys Dataset Loader

.. admonition:: Dataset Info
:class: dropdown
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, now the docstrings looks quite good. However, you need to update some files in the documentation source code to make this visible to users.

  • docs/source/mirdata.rst: here, you need to add your dataset as it is done for the rest!
  • docs/source/table.rst: here, you need to add your dataset as it is done for the rest as well. Add the number of tracks, available annotations, data availability, licensing, etc.
  • docs/source/quick_reference.rst: if your dataset has some annotation type of information that is not available in quick reference to link, please add it there, and therefore, users will be able to click the annotations in the table and read more info!

Comment on lines +164 to +169
spotify_uri (str): Spotify URI if available
key (str): path to the track's audio file
mode (str): path to the track's audio file
key_number (int): path to the track's audio file
mode_number (int): path to the track's audio file
audio (str): path to the track's audio file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of them seem to have path to the track's audio file as description. Please update! also please change audio for audio_path` .

Comment on lines +249 to +280
# librosa has problems reading FMA mp3s without clamping down to the second.
duration = librosa.get_duration(path=fhandle)
return librosa.load(fhandle, sr=None, mono=True, duration=floor(duration))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Let me know if yu want us to also take a look at that issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants