You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This line for training a Byte-Level BPE has an error. You have to add an initial alphabet of bytes, otherwise the tokenizer will not fall back to bytes when tokens are missing from the vocabulary and characters from your string can be missing when decoded.
For good reference and helping people, the training of a Byte-Level BPE should go as in this example.
Here is some shortened code so you don't have to follow the link or read it in a broken up tutorial:
This line for training a Byte-Level BPE has an error. You have to add an initial alphabet of bytes, otherwise the tokenizer will not fall back to bytes when tokens are missing from the vocabulary and characters from your string can be missing when decoded.
For good reference and helping people, the training of a Byte-Level BPE should go as in this example.
Here is some shortened code so you don't have to follow the link or read it in a broken up tutorial:
The text was updated successfully, but these errors were encountered: