-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding loader for Jazz Trio Database #652
base: master
Are you sure you want to change the base?
Conversation
mirdata/datasets/jtd.py
Outdated
|
||
""" | ||
return ( | ||
self._multitrack_metadata["mbz_id"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider simplifying these kinds of expressions to
self._multitrack_metadata.get("mbz_id", None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'd agree that your suggestion is more pythonic @pmcharrison but I've seen this syntax used in other loaders within mirdata
e.g. cipi.py
.
Will wait for more opinions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking another look now, I think this syntax is potentially redundant. The structure of the JTD metadata means that these fields (e.g., mbz_id
, track_name
, etc.) will always be present for every track, with no missing values.
So, in reality, these functions will never return None
, and will always return the expected type (str
, int
, whatever) from the field. With that in mind, in 87227c6 I'm just accessing the desired fields directly without any of this additional logic. e.g., we can just do:
@property
def musicbrainz_id(self) -> str:
"""The MusicBrainz ID for the recording
Returns:
* str - musicbrainz ID
"""
return self._multitrack_metadata["mbz_id"]
Happy to revert back if the mirdata
maintainers prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer the .get()
method, but I see that this is not something consistent throughout all loaders, and that here is not technically needed. For now, no need to change it. I'll let the other maintainers to give their opinion if they wish so!
In any case... mybe it'd be cool to standardize it in all the package...!
Made some changes in response to @pmcharrison's comments (for maintainers, Peter is a co-author on the original paper introducing this dataset) |
These changes look good, thank you! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #652 +/- ##
==========================================
+ Coverage 97.07% 97.13% +0.06%
==========================================
Files 68 69 +1
Lines 7583 7747 +164
==========================================
+ Hits 7361 7525 +164
Misses 222 222 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @HuwCheston and also @pmcharrison, thanks for this great PR! That is neat :) I am leaving here minor suggested changes mostly to improve readability of the docs. About Issue #651... I am taking a look at it. But also as I am suggesting for the new decorator, I think this fixes and upgrade could be better included in separated PRs :) I'd like the rest of the colleagues maintaining mirdata to take a look though. Thanks again!
def coerce_to_string_io_multiple_args(func) -> Callable: | ||
"""Little hack of the decorator in mirdata.io that allows for multiple args to be passed to the `func`""" | ||
|
||
@functools.wraps(func) | ||
def wrapper( | ||
file_path_or_obj: Optional[Union[str, TextIO]], *args | ||
) -> Optional[io.T]: | ||
if not file_path_or_obj: | ||
return None | ||
if isinstance(file_path_or_obj, str): | ||
with open(file_path_or_obj, encoding="utf-8") as f: | ||
return func(f, *args) | ||
else: | ||
return func(file_path_or_obj, *args) | ||
|
||
return wrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! We have been doing it differently when we have load_*
functions with multiple args, however, this solution might be a bit better. I would like the other mirdata maintainers to take a look, but in any case, if we chose your solution, I'd suggest to use the same hack that is used in other loaders for the load_*
functions with multiple args here, and then we can have a separate PR to include this decorator to the core files and update all loaders with load functions that have multiple arguments. Thanks for that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to use the same hack that is used in other loaders for the load_* functions with multiple args here
Thanks -- is there any chance you can point me towards these other loaders so I can implement this?
Thanks @genisplaja, should've now made the requested changes to the docs and can see from the readthedocs build that they look better. If you can point me towards |
Description
Please include the following information at the top level docstring for the dataset's module mydataset.py:
Dataset loaders checklist:
scripts/
, e.g.make_my_dataset_index.py
, which generates an index file.mirdata/my_dataset.py
tests/datasets/
, e.g.test_my_dataset.py
docs/source/mirdata.rst
anddocs/source/table.rst
black
,flake8
andmypy
(see Running your tests locally).tests/test_full_dataset.py
on your dataset.Dataset
class in order to pass thetest_load_mtracks
function. See my issue open here Failingtest_full_dataset.py
with customMultiTrack
class #651If your dataset is not fully downloadable there are two extra steps you should follow:
pytest -s tests/test_full_dataset.py --local --dataset my_dataset
once on your dataset locally and confirmed it passes.Dataset
class in order to pass thetest_load_mtracks
function. See my issue open here Failingtest_full_dataset.py
with customMultiTrack
class #651