Skip to content

Commit

Permalink
Make tokenizer.cpp CLI tool nicer.
Browse files Browse the repository at this point in the history
Before this commit, tokenize was a simple CLI tool like this:

  tokenize MODEL_FILENAME PROMPT [--ids]

This simple tool loads the model, takes the prompt, and shows the tokens
llama.cpp is interpreting.

This changeset makes the tokenize more sophisticated, and more useful
for debugging and troubleshooting:

  tokenize [-m, --model MODEL_FILENAME]
           [--ids]
           [--stdin]
           [--prompt]
           [-f, --file]
           [--no-bos]
           [--log-disable]

It also behaves nicer on Windows now, interpreting and rendering Unicode
from command line arguments and pipes no matter what code page the user
has set on their terminal.
  • Loading branch information
Noeda committed Mar 26, 2024
1 parent 557410b commit cd7b5f7
Showing 1 changed file with 407 additions and 10 deletions.

0 comments on commit cd7b5f7

Please sign in to comment.