Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Lingua2 can discards tokens based on a probability threshold #150

Open
Meguminnnnnnnn opened this issue May 9, 2024 · 1 comment
Assignees
Labels
feature request New feature or request

Comments

@Meguminnnnnnnn
Copy link

Is your feature request related to a problem? Please describe.

Lingua2 struggles to get a perfect compression by setting a fixed discard ratio or target length, because the perfect compression ratio which can preserve all valid tokens and discard all redundant tokens varies for different texts.
I think set a probability threshold instead of setting ration or target length can solve this problem: token will be discarded if its probability of 'discard label' exceed the threshold.

Describe the solution you'd like

No response

Additional context

No response

@Meguminnnnnnnn Meguminnnnnnnn added the feature request New feature or request label May 9, 2024
@iofu728
Copy link
Contributor

iofu728 commented May 10, 2024

Hi @Meguminnnnnnnn, thank you for your suggestion. We will enhance the related features in future iterations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants