You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Lingua2 struggles to get a perfect compression by setting a fixed discard ratio or target length, because the perfect compression ratio which can preserve all valid tokens and discard all redundant tokens varies for different texts.
I think set a probability threshold instead of setting ration or target length can solve this problem: token will be discarded if its probability of 'discard label' exceed the threshold.
Describe the solution you'd like
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Lingua2 struggles to get a perfect compression by setting a fixed discard ratio or target length, because the perfect compression ratio which can preserve all valid tokens and discard all redundant tokens varies for different texts.
I think set a probability threshold instead of setting ration or target length can solve this problem: token will be discarded if its probability of 'discard label' exceed the threshold.
Describe the solution you'd like
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: