Adding loader for Jazz Trio Database #652

HuwCheston · 2024-12-23T10:59:21Z

Description

Please include the following information at the top level docstring for the dataset's module mydataset.py:

Describe annotations included in the dataset
Indicate the size of the datasets (e.g. number files and duration, hours)
Mention the origin of the dataset (e.g. creator, institution)
Describe the type of music included in the dataset
Indicate any relevant papers related to the dataset
Include a description about how the data can be accessed and the license it uses (if applicable)

Dataset loaders checklist:

Create a script in scripts/, e.g. make_my_dataset_index.py, which generates an index file.
Run the script on the canonical version of the dataset and upload the index to Zenodo Audio Data Loaders community.
- Currently awaiting approval.
Create a sample version of the index with the necessary information for testing.
Create a module in mirdata, e.g. mirdata/my_dataset.py
Create tests for your loader in tests/datasets/, e.g. test_my_dataset.py
Add your module to docs/source/mirdata.rst and docs/source/table.rst
Run black, flake8 and mypy (see Running your tests locally).
Run tests/test_full_dataset.py on your dataset.
- I have run this test on the canonical version of the database. However, note that I had to make changes to the Dataset class in order to pass the test_load_mtracks function. See my issue open here Failing test_full_dataset.py with custom MultiTrack class #651
Check that codecov coverage does not decrease.

If your dataset is not fully downloadable there are two extra steps you should follow:

Contacting the mirdata organizers by opening an issue or PR so we can discuss how to proceed with the closed dataset.
- I am opening this PR, hopefully this is ok for discussion 😀
Show that the version used to create the checksum is the "canonical" one, either by getting the version from the dataset creator, or by verifying equivalence with several other copies of the dataset.
- I am the creator of the database and have created the checksums using my own, canonical, version.
Make sure someone has run pytest -s tests/test_full_dataset.py --local --dataset my_dataset once on your dataset locally and confirmed it passes.
- I have run this test on the canonical version of the database. However, note that I had to make changes to the Dataset class in order to pass the test_load_mtracks function. See my issue open here Failing test_full_dataset.py with custom MultiTrack class #651

pmcharrison · 2024-12-24T15:25:55Z

mirdata/datasets/jtd.py

+
+        """
+        return (
+            self._multitrack_metadata["mbz_id"]


Consider simplifying these kinds of expressions to

self._multitrack_metadata.get("mbz_id", None)

Thanks, I'd agree that your suggestion is more pythonic @pmcharrison but I've seen this syntax used in other loaders within mirdata e.g. cipi.py.

Will wait for more opinions!

Taking another look now, I think this syntax is potentially redundant. The structure of the JTD metadata means that these fields (e.g., mbz_id, track_name, etc.) will always be present for every track, with no missing values.

So, in reality, these functions will never return None, and will always return the expected type (str, int, whatever) from the field. With that in mind, in 87227c6 I'm just accessing the desired fields directly without any of this additional logic. e.g., we can just do:

@property def musicbrainz_id(self) -> str: """The MusicBrainz ID for the recording Returns: * str - musicbrainz ID """ return self._multitrack_metadata["mbz_id"]

Happy to revert back if the mirdata maintainers prefer.

I personally prefer the .get() method, but I see that this is not something consistent throughout all loaders, and that here is not technically needed. For now, no need to change it. I'll let the other maintainers to give their opinion if they wish so!
In any case... mybe it'd be cool to standardize it in all the package...!

mirdata/datasets/jtd.py

HuwCheston · 2025-01-02T09:54:01Z

Made some changes in response to @pmcharrison's comments (for maintainers, Peter is a co-author on the original paper introducing this dataset)

pmcharrison · 2025-01-02T10:24:51Z

These changes look good, thank you!

codecov · 2025-01-02T20:11:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.13%. Comparing base (581f4c4) to head (f6f6d43).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #652      +/-   ##
==========================================
+ Coverage   97.07%   97.13%   +0.06%     
==========================================
  Files          68       69       +1     
  Lines        7583     7747     +164     
==========================================
+ Hits         7361     7525     +164     
  Misses        222      222

genisplaja

Hey @HuwCheston and also @pmcharrison, thanks for this great PR! That is neat :) I am leaving here minor suggested changes mostly to improve readability of the docs. About Issue #651... I am taking a look at it. But also as I am suggesting for the new decorator, I think this fixes and upgrade could be better included in separated PRs :) I'd like the rest of the colleagues maintaining mirdata to take a look though. Thanks again!

docs/source/mirdata.rst

mirdata/datasets/jtd.py

genisplaja · 2025-01-03T13:44:42Z

mirdata/datasets/jtd.py

+def coerce_to_string_io_multiple_args(func) -> Callable:
+    """Little hack of the decorator in mirdata.io that allows for multiple args to be passed to the `func`"""
+
+    @functools.wraps(func)
+    def wrapper(
+        file_path_or_obj: Optional[Union[str, TextIO]], *args
+    ) -> Optional[io.T]:
+        if not file_path_or_obj:
+            return None
+        if isinstance(file_path_or_obj, str):
+            with open(file_path_or_obj, encoding="utf-8") as f:
+                return func(f, *args)
+        else:
+            return func(file_path_or_obj, *args)
+
+    return wrapper


Ok! We have been doing it differently when we have load_* functions with multiple args, however, this solution might be a bit better. I would like the other mirdata maintainers to take a look, but in any case, if we chose your solution, I'd suggest to use the same hack that is used in other loaders for the load_* functions with multiple args here, and then we can have a separate PR to include this decorator to the core files and update all loaders with load functions that have multiple arguments. Thanks for that!

I'd suggest to use the same hack that is used in other loaders for the load_* functions with multiple args here

Thanks -- is there any chance you can point me towards these other loaders so I can implement this?

HuwCheston · 2025-01-03T15:57:11Z

Thanks @genisplaja, should've now made the requested changes to the docs and can see from the readthedocs build that they look better. If you can point me towards the same hack that is used in other loaders for the load_* functions with multiple args I'm happy to integrate this rather than creating my own hack of the decorator :)

HuwCheston added 24 commits December 19, 2024 13:06

Added index script

65333a6

Updated JTD index filepath

b122fd7

Added resources for tests

39362a1

Early version of jtd module

c95d92d

When making JTD index use tuple of nulls rather than a single null

afe3a7c

Stripping trailing '/' from index

8c5836c

Add metadata to track dictionaries when creating JTD index

f7cfbe7

Update test resources and index

b64f763

Update JTD module

a19b986

Add beats file to multitrack indexes for JTD

f8b8964

Update sample index to include beats

9092c75

Add JTD to docs

261fa8d

Update JTD module with beats for individual stems

7173d00

Add JTD tests

c386af1

Increase test coverage

dc165fc

Add smart_open to JTD module

3400c89

Make instrument an attribute of JTD module, not a property

c1cbeb5

Minor change for typing

d18516f

Bump index checksum

65b2c0b

Return None when no MIDI available

6ac7b0c

Maybe added attribute necessary to pass failing test_load_mtracks

665001a

Add missing property to JTD multitrack

5a4f899

Allow parsing timestamps in hour-minute-second format

b019436

Black formatting

234097d

HuwCheston changed the title ~~[WIP] Adding loader for Jazz Trio Database~~ Adding loader for Jazz Trio Database Dec 23, 2024

pmcharrison reviewed Dec 24, 2024

View reviewed changes

pmcharrison reviewed Dec 27, 2024

View reviewed changes

mirdata/datasets/jtd.py Outdated Show resolved Hide resolved

mirdata/datasets/jtd.py Outdated Show resolved Hide resolved

mirdata/datasets/jtd.py Outdated Show resolved Hide resolved

mirdata/datasets/jtd.py Outdated Show resolved Hide resolved

HuwCheston added 3 commits January 2, 2025 09:23

Module docstring changes

4e42bdc

Module code changes

87227c6

Remove Optional from some type hints

a7e9025

HuwCheston added 2 commits January 3, 2025 09:41

Increase test coverage

25bd9d2

Black formatting

f6f6d43

genisplaja requested changes Jan 3, 2025

View reviewed changes

HuwCheston added 2 commits January 3, 2025 15:44

Fixes for docs

0d76c06

Fixes for docs

0679cc8

HuwCheston requested a review from genisplaja January 3, 2025 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding loader for Jazz Trio Database #652

Adding loader for Jazz Trio Database #652

HuwCheston commented Dec 23, 2024 •

edited

Loading

pmcharrison Dec 24, 2024

HuwCheston Dec 24, 2024

HuwCheston Jan 2, 2025 •

edited

Loading

genisplaja Jan 3, 2025

HuwCheston commented Jan 2, 2025

pmcharrison commented Jan 2, 2025

codecov bot commented Jan 2, 2025 •

edited

Loading

genisplaja left a comment •

edited

Loading

genisplaja Jan 3, 2025

HuwCheston Jan 3, 2025

HuwCheston commented Jan 3, 2025

Adding loader for Jazz Trio Database #652

Are you sure you want to change the base?

Adding loader for Jazz Trio Database #652

Conversation

HuwCheston commented Dec 23, 2024 • edited Loading

Description

Dataset loaders checklist:

pmcharrison Dec 24, 2024

Choose a reason for hiding this comment

HuwCheston Dec 24, 2024

Choose a reason for hiding this comment

HuwCheston Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

genisplaja Jan 3, 2025

Choose a reason for hiding this comment

HuwCheston commented Jan 2, 2025

pmcharrison commented Jan 2, 2025

codecov bot commented Jan 2, 2025 • edited Loading

Codecov Report

genisplaja left a comment • edited Loading

Choose a reason for hiding this comment

genisplaja Jan 3, 2025

Choose a reason for hiding this comment

HuwCheston Jan 3, 2025

Choose a reason for hiding this comment

HuwCheston commented Jan 3, 2025

HuwCheston commented Dec 23, 2024 •

edited

Loading

HuwCheston Jan 2, 2025 •

edited

Loading

codecov bot commented Jan 2, 2025 •

edited

Loading

genisplaja left a comment •

edited

Loading