Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Has translate be integrated into transcribe? It returns English but expect Chinese. #183

Open
EarlWilliam opened this issue Mar 6, 2024 · 3 comments

Comments

@EarlWilliam
Copy link

Hi all. I am trying to use large-v3-32-2-conditioned-prompt-logic-timestamped to transcribe audio with chinese language. However, it returns English translation of origin Chinese content.

Here is the code:
from whisper_jax import FlaxWhisperPipline
pipeline = FlaxWhisperPipline("sanchit-gandhi/large-v3-32-2-conditioned-prompt-logic-timestamped")
outputs = pipeline("R1.wav", task="transcribe", return_timestamps=True, language="chinese")
print(outputs)

Here is the result with task="translate" and language="chinese":
{'text': " Hello, 130 is at your service. Hello. Hello. The parking lot at the entrance is full of black Mercedes Benz cars. All cars are waiting to go to work. Oh, the police station is not there yet, right? Not yet. It's been more than half an hour. OK, we'll hurry up. Wait a minute. OK. We will rush it. Wait a moment.", 'chunks': [{'timestamp': (0.0, 5.6), 'text': ' Hello, police station 130 is at your service.'}, {'timestamp': (5.6, 6.4), 'text': ' Hello.'}, {'timestamp': (6.4, 7.8), 'text': ' Hello.'}, {'timestamp': (7.8, 12.4), 'text': ' The parking lot at the entrance is full of black Mercedes Benz cars.'}, {'timestamp': (12.4, 14.8), 'text': ' All cars are waiting to go to work.'}, {'timestamp': (14.8, 16.6), 'text': ' Oh, the police station is not there yet, right?'}, {'timestamp': (16.6, 19.6), 'text': " Not yet. It's been more than half an hour."}, {'timestamp': (19.6, 21.0), 'text': " OK, we'll hurry up."}, {'timestamp': (21.0, 21.6), 'text': ' Wait a minute.'}, {'timestamp': (21.6, 22.6), 'text': ' OK.'}, {'timestamp': (19.87, 21.87), 'text': ' We will rush it. Wait a moment.'}]}

Here is the result with task="transcribe" and language="chinese":
{'text': " Hello, 1.30 For you for you. Hello. Hey, you know. This is the food's at the way a lot of the car businging's all the car all the car are still not not. No, it's It's just half a hour hours. Okay, we're we're okay, we're okay, we're let's let's let's wait. Thank you.", 'chunks': [{'timestamp': (0.0, 0.84), 'text': ' Hello,'}, {'timestamp': (0.84, 3.72), 'text': ' 1.30'}, {'timestamp': (3.72, 5.8), 'text': ' For you for you.'}, {'timestamp': (5.8, 6.64), 'text': ' Hello.'}, {'timestamp': (6.64, 8.36), 'text': ' Hey, you know.'}, {'timestamp': (8.36, 9.52), 'text': ' This is the'}, {'timestamp': (9.52, 10.68), 'text': " food's at the way"}, {'timestamp': (10.68, 11.72), 'text': ' a lot of the car'}, {'timestamp': (11.72, 12.8), 'text': " businging's"}, {'timestamp': (12.8, 13.72), 'text': ' all the car'}, {'timestamp': (13.72, 14.84), 'text': ' all the car'}, {'timestamp': (14.84, 15.92), 'text': ' are still'}, {'timestamp': (15.92, 15.96), 'text': ' not'}, {'timestamp': (15.96, 16.88), 'text': ' not.'}, {'timestamp': (16.88, 17.8), 'text': ' No,'}, {'timestamp': (17.8, 18.68), 'text': " it's"}, {'timestamp': (18.68, 18.96), 'text': " It's just half"}, {'timestamp': (18.96, 19.12), 'text': ' a'}, {'timestamp': (19.12, 19.16), 'text': ' hour'}, {'timestamp': (19.16, 19.72), 'text': ' hours.'}, {'timestamp': (19.72, 19.92), 'text': ' Okay,'}, {'timestamp': (19.92, 20.36), 'text': " we're"}, {'timestamp': (20.36, 20.72), 'text': " we're okay,"}, {'timestamp': (20.72, 20.36), 'text': " we're"}, {'timestamp': (20.36, 20.72), 'text': ' okay,'}, {'timestamp': (20.72, 22.36), 'text': " we're"}, {'timestamp': (22.36, 21.72), 'text': " let's"}, {'timestamp': (21.72, 22.92), 'text': " let's"}, {'timestamp': (22.92, None), 'text': " let's wait. Thank you."}]}

Here is the result returned by openai-large-v2:
{'text': '您好,话务员为您服务。你好。喂,你好。这边停车场这个出入口的位置啊,一辆黑色的奔驰车把这路堵了。这边所有 车辆都等着上班啊。哦,人还没到是吧?没到,已经都半个多小时了。行行,我们催一下,稍等,马上就到了。好,那我们先催一下,稍等,马上就要了。', 'chunks': [{'timestamp': (0.0, 6.0), 'text': '您好,话务员为您服务。'}, {'timestamp': (6.0, 7.0), 'text': '你好。'}, {'timestamp': (7.0, 8.0), 'text': '喂,你好。'}, {'timestamp': (8.0, 12.0), 'text': '这边停车场这个出入口的位置啊,一辆黑色的奔驰车把这路堵了。'}, {'timestamp': (12.0, 15.0), 'text': '这边所有车辆都等着上班啊。'}, {'timestamp': (15.0, 17.0), 'text': '哦,人还没到是吧?'}, {'timestamp': (17.0, 20.0), 'text': '没到,已经都半个多小时了 。'}, {'timestamp': (20.0, 23.0), 'text': '行行,我们催一下,稍等,马上就到了。'}, {'timestamp': (19.87, 21.87), 'text': '好,那我们先催一下,稍等,马上就要了。'}]}

Thanks for any advice.

@EarlWilliam
Copy link
Author

It seems the model correctly recognize content of the audio, but translate the content to English and then return.

@sanchit-gandhi
Copy link
Owner

Hey @EarlWilliam - this model is part of the Distil-Whisper series, and is thus trained on English speech only. This likely explains why it only transcribes in English. If you're interested in training a Distil-Whisper model in Chinese, refer to the training guide: https://github.com/huggingface/distil-whisper/tree/main/training

Otherwise, you can select a multilingual Whisper model from the Hugging Face Hub: https://huggingface.co/models?language=zh&other=whisper&sort=trending

This model will respect the language argument you pass to the pipeline

@EarlWilliam
Copy link
Author

Hey @EarlWilliam - this model is part of the Distil-Whisper series, and is thus trained on English speech only. This likely explains why it only transcribes in English. If you're interested in training a Distil-Whisper model in Chinese, refer to the training guide: https://github.com/huggingface/distil-whisper/tree/main/training

Otherwise, you can select a multilingual Whisper model from the Hugging Face Hub: https://huggingface.co/models?language=zh&other=whisper&sort=trending

This model will respect the language argument you pass to the pipeline

Thank you for your explanation! I will give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants