Replies: 1 comment 1 reply
-
Hi @vishjain, It is usually not easy to get a good noise estimate based on a speech enhancement model solely trained to generate enhanced speech. One reason is that the model is often trained to with some scale-invariant loss (e.g., scale-invariant signal-to-noise ratio loss) which results in the enhanced speech having a very different scale from the original input. Thus, we cannot simply obtain the noise signal by subtracting the enhanced speech from the noisy input. Another reason is that even though the model can generate an enhanced speech signal with an appropriate scale, the residual signal after subtraction can still contain some speech due to the imperfect speech estimation. The best solution is to use speech enhancement models that were already trained on estimate both speech and noise signals from the input. |
Beta Was this translation helpful? Give feedback.
-
Hey,
Is there a way way to approximate noise (to write to a file) from a mixed original audio & enhanced speech audio without having to train a new model?
I have the original mixed audio, a single channel enhancement model, and the output enhanced speech audio, but I'd also like to extract the noise.
I tried roughly doing mixed audio minus enhanced audio for models like this: https://huggingface.co/espnet/yen-ju-lu-dns_ins20_enh_train_enh_blstm_tf_raw_valid.loss.best
but they only produce noise files that still have speech with some background noise. Would appreciate any ideas!
Beta Was this translation helpful? Give feedback.
All reactions