-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Presence of character Ġ before each token in output #87
Comments
Happened to me using GPT-2 and solved this issue by adding the following line: right after the first loop line of nmf.explore() method: |
Yeah, that shouldn't happen. A bunch of tokenizers have a character like Ġ in the beginning of a token to indicate that the token is linked to whatever token comes before them in the sequence. Which is why rendering the output needs to run in tandem with the tokenizer and its settings. |
I was working on the "05- Neuron Factors.ipynb" notebook and noticed the presence of character Ġ before each token in the output. The output is for the code "nmf_1.explore()". I am not quite sure why it is doing that. Please check the screenshot below.
Your help is appreciated.
The text was updated successfully, but these errors were encountered: