Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper : improve handling of prompts #1981

Merged
merged 2 commits into from Mar 25, 2024
Merged

whisper : improve handling of prompts #1981

merged 2 commits into from Mar 25, 2024

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Mar 21, 2024

fix #1960, #1961, #1962

Changed whisper_tokenize() to return the negative number of required tokens when the token buffer is not big enough. This can be used to determine the necessary tokens for a given text:

int n_needed = -whisper_tokenize(ctx, "some text", NULL, 0);

Also improve docs by clarifying that the Whisper models can process prompts of up to only n_text_ctx/2 tokens which is 224 tokens. So there is no point to provide longer prompts as they would be truncated:

whisper.cpp/whisper.cpp

Lines 5474 to 5480 in 48a1452

// if we have already generated some text, use it as a prompt to condition the next generation
if (!prompt_past.empty() && t_cur < 0.5f && params.n_max_text_ctx > 0) {
int n_take = std::min(std::min(params.n_max_text_ctx, whisper_n_text_ctx(ctx)/2), int(prompt_past.size()));
prompt = { whisper_token_prev(ctx) };
prompt.insert(prompt.begin() + 1, prompt_past.end() - n_take, prompt_past.end());
}

@ggerganov
Copy link
Owner Author

@sindresorhus lmk if this change would work for you

@sindresorhus
Copy link
Contributor

sindresorhus commented Mar 24, 2024

Thanks for looking into this. That would work, although I still think wrapping it up into a separate function would make for a better API and make it more discoverable for consumers. I would never have thought of using whisper_tokenize like that.

// Some docs
int whisper_token_count(struct whisper_context * ctx, const char * text) {
   return -whisper_tokenize(ctx, text, NULL, 0);
}

@sindresorhus
Copy link
Contributor

fix #1960, #1961, #1962

By the way, you need to make that:

fix #1960, fix #1961, fix #1962

For GitHub to do the right thing and close them all when this PR is merged.

@@ -207,7 +207,7 @@ void whisper_print_usage(int /*argc*/, char ** argv, const whisper_params & para
fprintf(stderr, " -nt, --no-timestamps [%-7s] do not print timestamps\n", params.no_timestamps ? "true" : "false");
fprintf(stderr, " -l LANG, --language LANG [%-7s] spoken language ('auto' for auto-detect)\n", params.language.c_str());
fprintf(stderr, " -dl, --detect-language [%-7s] exit after automatically detecting language\n", params.detect_language ? "true" : "false");
fprintf(stderr, " --prompt PROMPT [%-7s] initial prompt\n", params.prompt.c_str());
fprintf(stderr, " --prompt PROMPT [%-7s] initial prompt (max n_text_ctx/2 tokens)\n", params.prompt.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would document the 224 number here for quick reference.

Same in

// maximum of whisper_n_text_ctx()/2 tokens are used

@ggerganov ggerganov merged commit 1558ec5 into master Mar 25, 2024
6 checks passed
jiahansu pushed a commit to OOPRY/whisper.cpp that referenced this pull request Apr 17, 2024
* whisper : improve handling of prompts

* whisper : add whisper_token_count helper
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document max tokens for params.initial_prompt
2 participants