feat: smooth and combine token output #936
Merged
+264
−103
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I noticed while using the Anthropic endpoint that the output would jump by ~10 tokens, pause for 200ms, and repeat. I've implemented a smoothing function for the streamed output that attempts to estimate the token/s and output at that rate instead.
I also added a chunking function that attempts to concatenate tokens like
do
andn't
. It made it a bit easier to read, especially for long words and code which are split quite heavily.Some stats for chunking (Anthropic Claude 3 Opus):
~10% fewer tokens in regular text
~25% fewer tokens in mix of 80% code and 20% text
Also, to make these changes easily, I moved the logic into a separate utils file. But I'm pretty sure that's not the right place for this :D