Has translate be integrated into transcribe? It returns English but expect Chinese. #183

EarlWilliam · 2024-03-06T15:30:04Z

Hi all. I am trying to use large-v3-32-2-conditioned-prompt-logic-timestamped to transcribe audio with chinese language. However, it returns English translation of origin Chinese content.

Here is the code:
from whisper_jax import FlaxWhisperPipline
pipeline = FlaxWhisperPipline("sanchit-gandhi/large-v3-32-2-conditioned-prompt-logic-timestamped")
outputs = pipeline("R1.wav", task="transcribe", return_timestamps=True, language="chinese")
print(outputs)

Here is the result with task="translate" and language="chinese":
{'text': " Hello, 130 is at your service. Hello. Hello. The parking lot at the entrance is full of black Mercedes Benz cars. All cars are waiting to go to work. Oh, the police station is not there yet, right? Not yet. It's been more than half an hour. OK, we'll hurry up. Wait a minute. OK. We will rush it. Wait a moment.", 'chunks': [{'timestamp': (0.0, 5.6), 'text': ' Hello, police station 130 is at your service.'}, {'timestamp': (5.6, 6.4), 'text': ' Hello.'}, {'timestamp': (6.4, 7.8), 'text': ' Hello.'}, {'timestamp': (7.8, 12.4), 'text': ' The parking lot at the entrance is full of black Mercedes Benz cars.'}, {'timestamp': (12.4, 14.8), 'text': ' All cars are waiting to go to work.'}, {'timestamp': (14.8, 16.6), 'text': ' Oh, the police station is not there yet, right?'}, {'timestamp': (16.6, 19.6), 'text': " Not yet. It's been more than half an hour."}, {'timestamp': (19.6, 21.0), 'text': " OK, we'll hurry up."}, {'timestamp': (21.0, 21.6), 'text': ' Wait a minute.'}, {'timestamp': (21.6, 22.6), 'text': ' OK.'}, {'timestamp': (19.87, 21.87), 'text': ' We will rush it. Wait a moment.'}]}

Here is the result with task="transcribe" and language="chinese":
{'text': " Hello, 1.30 For you for you. Hello. Hey, you know. This is the food's at the way a lot of the car businging's all the car all the car are still not not. No, it's It's just half a hour hours. Okay, we're we're okay, we're okay, we're let's let's let's wait. Thank you.", 'chunks': [{'timestamp': (0.0, 0.84), 'text': ' Hello,'}, {'timestamp': (0.84, 3.72), 'text': ' 1.30'}, {'timestamp': (3.72, 5.8), 'text': ' For you for you.'}, {'timestamp': (5.8, 6.64), 'text': ' Hello.'}, {'timestamp': (6.64, 8.36), 'text': ' Hey, you know.'}, {'timestamp': (8.36, 9.52), 'text': ' This is the'}, {'timestamp': (9.52, 10.68), 'text': " food's at the way"}, {'timestamp': (10.68, 11.72), 'text': ' a lot of the car'}, {'timestamp': (11.72, 12.8), 'text': " businging's"}, {'timestamp': (12.8, 13.72), 'text': ' all the car'}, {'timestamp': (13.72, 14.84), 'text': ' all the car'}, {'timestamp': (14.84, 15.92), 'text': ' are still'}, {'timestamp': (15.92, 15.96), 'text': ' not'}, {'timestamp': (15.96, 16.88), 'text': ' not.'}, {'timestamp': (16.88, 17.8), 'text': ' No,'}, {'timestamp': (17.8, 18.68), 'text': " it's"}, {'timestamp': (18.68, 18.96), 'text': " It's just half"}, {'timestamp': (18.96, 19.12), 'text': ' a'}, {'timestamp': (19.12, 19.16), 'text': ' hour'}, {'timestamp': (19.16, 19.72), 'text': ' hours.'}, {'timestamp': (19.72, 19.92), 'text': ' Okay,'}, {'timestamp': (19.92, 20.36), 'text': " we're"}, {'timestamp': (20.36, 20.72), 'text': " we're okay,"}, {'timestamp': (20.72, 20.36), 'text': " we're"}, {'timestamp': (20.36, 20.72), 'text': ' okay,'}, {'timestamp': (20.72, 22.36), 'text': " we're"}, {'timestamp': (22.36, 21.72), 'text': " let's"}, {'timestamp': (21.72, 22.92), 'text': " let's"}, {'timestamp': (22.92, None), 'text': " let's wait. Thank you."}]}

Here is the result returned by openai-large-v2:
{'text': '您好,话务员为您服务。你好。喂,你好。这边停车场这个出入口的位置啊,一辆黑色的奔驰车把这路堵了。这边所有车辆都等着上班啊。哦,人还没到是吧?没到,已经都半个多小时了。行行,我们催一下,稍等,马上就到了。好,那我们先催一下,稍等,马上就要了。', 'chunks': [{'timestamp': (0.0, 6.0), 'text': '您好,话务员为您服务。'}, {'timestamp': (6.0, 7.0), 'text': '你好。'}, {'timestamp': (7.0, 8.0), 'text': '喂,你好。'}, {'timestamp': (8.0, 12.0), 'text': '这边停车场这个出入口的位置啊,一辆黑色的奔驰车把这路堵了。'}, {'timestamp': (12.0, 15.0), 'text': '这边所有车辆都等着上班啊。'}, {'timestamp': (15.0, 17.0), 'text': '哦,人还没到是吧?'}, {'timestamp': (17.0, 20.0), 'text': '没到,已经都半个多小时了。'}, {'timestamp': (20.0, 23.0), 'text': '行行,我们催一下,稍等,马上就到了。'}, {'timestamp': (19.87, 21.87), 'text': '好,那我们先催一下,稍等,马上就要了。'}]}

Thanks for any advice.

EarlWilliam · 2024-03-06T15:31:30Z

It seems the model correctly recognize content of the audio, but translate the content to English and then return.

sanchit-gandhi · 2024-03-06T16:04:18Z

Hey @EarlWilliam - this model is part of the Distil-Whisper series, and is thus trained on English speech only. This likely explains why it only transcribes in English. If you're interested in training a Distil-Whisper model in Chinese, refer to the training guide: https://github.com/huggingface/distil-whisper/tree/main/training

Otherwise, you can select a multilingual Whisper model from the Hugging Face Hub: https://huggingface.co/models?language=zh&other=whisper&sort=trending

This model will respect the language argument you pass to the pipeline

EarlWilliam · 2024-03-06T16:09:12Z

Hey @EarlWilliam - this model is part of the Distil-Whisper series, and is thus trained on English speech only. This likely explains why it only transcribes in English. If you're interested in training a Distil-Whisper model in Chinese, refer to the training guide: https://github.com/huggingface/distil-whisper/tree/main/training

Otherwise, you can select a multilingual Whisper model from the Hugging Face Hub: https://huggingface.co/models?language=zh&other=whisper&sort=trending

This model will respect the language argument you pass to the pipeline

Thank you for your explanation! I will give it a try.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Has translate be integrated into transcribe? It returns English but expect Chinese. #183

Has translate be integrated into transcribe? It returns English but expect Chinese. #183

EarlWilliam commented Mar 6, 2024

EarlWilliam commented Mar 6, 2024

sanchit-gandhi commented Mar 6, 2024

EarlWilliam commented Mar 6, 2024

Has translate be integrated into transcribe? It returns English but expect Chinese. #183

Has translate be integrated into transcribe? It returns English but expect Chinese. #183

Comments

EarlWilliam commented Mar 6, 2024

EarlWilliam commented Mar 6, 2024

sanchit-gandhi commented Mar 6, 2024

EarlWilliam commented Mar 6, 2024