Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi again, I thought it'd be useful to have these token/word-level timestamps available from the whisper implementation here.
The first commit adds an option (
--cache
) to use the kv cache on the decoder self attention, in which case after the initial pass, only the final token is needed. My main goal isn't really to improve performance, but I couldn't get the timestamps working without this change. The speed gains on the CPU appear significant (about 4x in my case), though I haven't done any true benchmarking and any gains on the GPU or with the quantized models are less noticeable.For the timestamps I mainly followed OpenAI's python implementation, and given the same inputs the timestamps should match closely.
With these changes, passing the
--dtw-timestamps
flag prints the word-level timestamps:Time permitting, here are some things I'd still like to do here in no particular order:
Any feedback is appreciated 👍