-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Has translate be integrated into transcribe? It returns English but expect Chinese. #183
Comments
It seems the model correctly recognize content of the audio, but translate the content to English and then return. |
Hey @EarlWilliam - this model is part of the Distil-Whisper series, and is thus trained on English speech only. This likely explains why it only transcribes in English. If you're interested in training a Distil-Whisper model in Chinese, refer to the training guide: https://github.com/huggingface/distil-whisper/tree/main/training Otherwise, you can select a multilingual Whisper model from the Hugging Face Hub: https://huggingface.co/models?language=zh&other=whisper&sort=trending This model will respect the |
Thank you for your explanation! I will give it a try. |
Hi all. I am trying to use large-v3-32-2-conditioned-prompt-logic-timestamped to transcribe audio with chinese language. However, it returns English translation of origin Chinese content.
Here is the code:
from whisper_jax import FlaxWhisperPipline
pipeline = FlaxWhisperPipline("sanchit-gandhi/large-v3-32-2-conditioned-prompt-logic-timestamped")
outputs = pipeline("R1.wav", task="transcribe", return_timestamps=True, language="chinese")
print(outputs)
Here is the result with task="translate" and language="chinese":
{'text': " Hello, 130 is at your service. Hello. Hello. The parking lot at the entrance is full of black Mercedes Benz cars. All cars are waiting to go to work. Oh, the police station is not there yet, right? Not yet. It's been more than half an hour. OK, we'll hurry up. Wait a minute. OK. We will rush it. Wait a moment.", 'chunks': [{'timestamp': (0.0, 5.6), 'text': ' Hello, police station 130 is at your service.'}, {'timestamp': (5.6, 6.4), 'text': ' Hello.'}, {'timestamp': (6.4, 7.8), 'text': ' Hello.'}, {'timestamp': (7.8, 12.4), 'text': ' The parking lot at the entrance is full of black Mercedes Benz cars.'}, {'timestamp': (12.4, 14.8), 'text': ' All cars are waiting to go to work.'}, {'timestamp': (14.8, 16.6), 'text': ' Oh, the police station is not there yet, right?'}, {'timestamp': (16.6, 19.6), 'text': " Not yet. It's been more than half an hour."}, {'timestamp': (19.6, 21.0), 'text': " OK, we'll hurry up."}, {'timestamp': (21.0, 21.6), 'text': ' Wait a minute.'}, {'timestamp': (21.6, 22.6), 'text': ' OK.'}, {'timestamp': (19.87, 21.87), 'text': ' We will rush it. Wait a moment.'}]}
Here is the result with task="transcribe" and language="chinese":
{'text': " Hello, 1.30 For you for you. Hello. Hey, you know. This is the food's at the way a lot of the car businging's all the car all the car are still not not. No, it's It's just half a hour hours. Okay, we're we're okay, we're okay, we're let's let's let's wait. Thank you.", 'chunks': [{'timestamp': (0.0, 0.84), 'text': ' Hello,'}, {'timestamp': (0.84, 3.72), 'text': ' 1.30'}, {'timestamp': (3.72, 5.8), 'text': ' For you for you.'}, {'timestamp': (5.8, 6.64), 'text': ' Hello.'}, {'timestamp': (6.64, 8.36), 'text': ' Hey, you know.'}, {'timestamp': (8.36, 9.52), 'text': ' This is the'}, {'timestamp': (9.52, 10.68), 'text': " food's at the way"}, {'timestamp': (10.68, 11.72), 'text': ' a lot of the car'}, {'timestamp': (11.72, 12.8), 'text': " businging's"}, {'timestamp': (12.8, 13.72), 'text': ' all the car'}, {'timestamp': (13.72, 14.84), 'text': ' all the car'}, {'timestamp': (14.84, 15.92), 'text': ' are still'}, {'timestamp': (15.92, 15.96), 'text': ' not'}, {'timestamp': (15.96, 16.88), 'text': ' not.'}, {'timestamp': (16.88, 17.8), 'text': ' No,'}, {'timestamp': (17.8, 18.68), 'text': " it's"}, {'timestamp': (18.68, 18.96), 'text': " It's just half"}, {'timestamp': (18.96, 19.12), 'text': ' a'}, {'timestamp': (19.12, 19.16), 'text': ' hour'}, {'timestamp': (19.16, 19.72), 'text': ' hours.'}, {'timestamp': (19.72, 19.92), 'text': ' Okay,'}, {'timestamp': (19.92, 20.36), 'text': " we're"}, {'timestamp': (20.36, 20.72), 'text': " we're okay,"}, {'timestamp': (20.72, 20.36), 'text': " we're"}, {'timestamp': (20.36, 20.72), 'text': ' okay,'}, {'timestamp': (20.72, 22.36), 'text': " we're"}, {'timestamp': (22.36, 21.72), 'text': " let's"}, {'timestamp': (21.72, 22.92), 'text': " let's"}, {'timestamp': (22.92, None), 'text': " let's wait. Thank you."}]}
Here is the result returned by openai-large-v2:
{'text': '您好,话务员为您服务。你好。喂,你好。这边停车场这个出入口的位置啊,一辆黑色的奔驰车把这路堵了。这边所有 车辆都等着上班啊。哦,人还没到是吧?没到,已经都半个多小时了。行行,我们催一下,稍等,马上就到了。好,那我们先催一下,稍等,马上就要了。', 'chunks': [{'timestamp': (0.0, 6.0), 'text': '您好,话务员为您服务。'}, {'timestamp': (6.0, 7.0), 'text': '你好。'}, {'timestamp': (7.0, 8.0), 'text': '喂,你好。'}, {'timestamp': (8.0, 12.0), 'text': '这边停车场这个出入口的位置啊,一辆黑色的奔驰车把这路堵了。'}, {'timestamp': (12.0, 15.0), 'text': '这边所有车辆都等着上班啊。'}, {'timestamp': (15.0, 17.0), 'text': '哦,人还没到是吧?'}, {'timestamp': (17.0, 20.0), 'text': '没到,已经都半个多小时了 。'}, {'timestamp': (20.0, 23.0), 'text': '行行,我们催一下,稍等,马上就到了。'}, {'timestamp': (19.87, 21.87), 'text': '好,那我们先催一下,稍等,马上就要了。'}]}
Thanks for any advice.
The text was updated successfully, but these errors were encountered: