Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow a regular expression to describe tokens to suppress #1997

Merged
merged 3 commits into from Apr 9, 2024

Conversation

ulatekh
Copy link
Contributor

@ulatekh ulatekh commented Mar 26, 2024

Example: --suppress-tokens-re "[,.]|[ ]?[0-9]+" will suppress commas, periods, and numeric tokens.

Technique inspired by openai/whisper#1041 .
I used something like this while working with OpenAI's whisper & wanted to use it in this project.

Example: --suppress-tokens-re "[,\.]|[ ]?[0-9]+" will suppress commas, periods, and numeric tokens.

Technique inspired by openai/whisper#1041

Co-authored-by: Georgi Gerganov <[email protected]>
examples/command/command.cpp Outdated Show resolved Hide resolved
examples/command/command.cpp Outdated Show resolved Hide resolved
whisper.cpp Outdated Show resolved Hide resolved
whisper.h Outdated Show resolved Hide resolved
@ulatekh
Copy link
Contributor Author

ulatekh commented Mar 28, 2024

It should go without saying that I'm fine with your changes.

Copy link
Contributor Author

@ulatekh ulatekh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted changes

@ulatekh ulatekh force-pushed the suppress_tokens_re branch 4 times, most recently from 51dad18 to 4853a90 Compare April 1, 2024 17:03
@ggerganov ggerganov merged commit c8eeb93 into ggerganov:master Apr 9, 2024
53 checks passed
@ulatekh ulatekh deleted the suppress_tokens_re branch April 9, 2024 17:06
jiahansu pushed a commit to OOPRY/whisper.cpp that referenced this pull request Apr 17, 2024
* Allow a regular expression to describe tokens to suppress.

Example: --suppress-tokens-re "[,\.]|[ ]?[0-9]+" will suppress commas, periods, and numeric tokens.

Technique inspired by openai/whisper#1041

Co-authored-by: Georgi Gerganov <[email protected]>

* Blind change to fix Java test.

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants