-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WER Filtering takes too long? #80
Comments
Hey @macabdul9 - do you have a bash file configuration you're using to reproduce this error? It would be super helpful to see what configuration you're using so as to advise more appropriately here |
Generally speaking, you should ensure that the number of workers is less than or equal to the number of CPUs on your device (you can check this with the bash command |
I have replaced hf evaluate's WER metric with Jiwer's ( which I believe is same) and it fixes the issue. So mostly likely it has something to do with multiprocessing. Thanks. |
Hi @sanchit-gandhi !
Currently, WER filtering takes way too long with 8 workers, and going beyond 8 gives
self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory
. Also, it doesn't seem to cache filtered data which makes it too hard to run it for large data (up to 1M segments). Is there a way to expedite the filtering process?The text was updated successfully, but these errors were encountered: