Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No such file or directory in chapter1-preprocessing when trying to calculate durations #159

Open
f-fritz opened this issue Jan 17, 2024 · 0 comments

Comments

@f-fritz
Copy link

f-fritz commented Jan 17, 2024

Hey

I am not sure what the expected behaviour is and whether its my mistake, an error in the course or of the utilised dataset but I noticed the following in Chapter 1 - Preprocessing:

When I follow the course and try to execute

# use librosa to get example's duration from the audio file
new_column = [librosa.get_duration(path=x) for x in minds["path"]]

it will fail because the path, or x in the code snippet, looks something like /storage/hf-datasets-cache/all/datasets/27907695716030-config-parquet-and-info-PolyAI-minds14-941a5af2/downloads/extracted/a87e442545495cdb67dfdcbc9d4f35d234c9f8e471449b2db58d7c81b62f001a/en-AU~PAY_BILL/response_4.wav (which is the exact content as provided by the unmodified dataset as can be seen on the datasets page but does not exist on my machine).

Do I use the load_dataset function in a wrong way? Do I have to specify a path to explicitly save or cache the data somewhere? Is there a way that will automatically replace the 'path' value in the dataset with the local path on my machine?

Alternatively, one could change the function call of librosa.get_duration(path=x) and pass the audio array and the sampling_rate instead, e.g.

new_column = [librosa.get_duration(y=x["array"], sr=x["sampling_rate"]) for x in minds["audio"]]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant