Wrong Supported samplerate for Task 3: Target speaker extraction (16kHz) #34

yezhangyinge · 2024-12-17T08:30:03Z

I try "Sub-Task 1: Audio-only Speaker Extraction Conditioned on a Reference Speech", but found that

the supported samplerate is 8k and only work for 8k. If I input 16k audio, the model will output wrong result.
the model is really bad at separating 2 spks. Is the ckpt right here?

zexupan · 2024-12-17T08:46:13Z

Sorry we made a mistake, the "Sub-Task 1: Audio-only Speaker Extraction Conditioned on a Reference Speech" is 8kHz instead.

zexupan · 2024-12-17T08:58:22Z

For the performance, the model for this sub-task isn't trained on a large amount of data. It is only trained on WSJ0-2mix, so it may have a bad generalisation performance.

yezhangyinge · 2024-12-18T05:18:32Z

OK, I get it. Thanks for your reply! I would like to know if there will be a model that is trained on large amounts of data. Or will there be a model (except SpEx+ like BSRNN, TFGridNet) that has a better performance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong Supported samplerate for Task 3: Target speaker extraction (16kHz) #34

Wrong Supported samplerate for Task 3: Target speaker extraction (16kHz) #34

yezhangyinge commented Dec 17, 2024 •

edited

Loading

zexupan commented Dec 17, 2024

zexupan commented Dec 17, 2024

yezhangyinge commented Dec 18, 2024

Wrong Supported samplerate for Task 3: Target speaker extraction (16kHz) #34

Wrong Supported samplerate for Task 3: Target speaker extraction (16kHz) #34

Comments

yezhangyinge commented Dec 17, 2024 • edited Loading

zexupan commented Dec 17, 2024

zexupan commented Dec 17, 2024

yezhangyinge commented Dec 18, 2024

yezhangyinge commented Dec 17, 2024 •

edited

Loading