Repro Case for our Optimum Issue with Korean

This is a minimal reproducible example of a bug we're hitting with when converting Moonshine models to Onnx using Optimum.

In our situation this is only happening with Korean, so we're trying to track down what's going wrong. It's possible we're doing something wrong in our onnx-moonshine package that only shows up with non-English characters?

Installation

pip install -r requirements.txt

Downloading and Converting Models

Here's the script and args I'm using to get the ONNX version of the Korean Moonshine Models.

./download-moonshine-model.sh base ko

Internally it's calling optimum-cli export onnx --model ....

Reproducing the Problem

This repo includes an evaluation script that uses the Fleurs test set to calculate word and character error rates.

If you run this command it will print out the CER for Korean using Transformers:

python eval-moonshine-model.py --framework transformers --language ko_kr
CER: 9.04%

This error rate is what we'd expect. However if you run the same script using the ONNX versions of the models, you'll see a more worse accuracy score:

python eval-moonshine-model.py --framework onnx --language ko_kr
CER: 27.31%

Interestingly this doesn't happen with the English base models:

python eval-moonshine-model.py --framework onnx --language en_us
WER: 11.36%

python eval-moonshine-model.py --framework transformers --language en_us
WER: 11.36%

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
en-base		en-base
ko-base		ko-base
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download-moonshine-model.sh		download-moonshine-model.sh
eval-moonshine-model.py		eval-moonshine-model.py
koreantextnormalizer.py		koreantextnormalizer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repro Case for our Optimum Issue with Korean

Installation

Downloading and Converting Models

Reproducing the Problem

About

Uh oh!

Releases

Packages

Languages

License

moonshine-ai/korean-issue-repro

Folders and files

Latest commit

History

Repository files navigation

Repro Case for our Optimum Issue with Korean

Installation

Downloading and Converting Models

Reproducing the Problem

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages