-
-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Online GapEncoder #1439
Labels
Comments
@MaxHalford I can take this up, need some getting started materials for doing this on streams. Will go through the paper and |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
skrub is a wonderful new project related to scikit-learn. You can see Gaël Varoquaux present it here. They have a transformer called
GapEncoder
: it's a way to embed fuzzy strings. This could be really powerful online, say for classifying Tweets or Twitch messages, where typos are aplenty.We already have a way to do online TD-IDF/count vectorization. But we don't have Gamma-Poisson matrix factorization. It is doable online though. Once we have it, we could assemble the two into a nice GapEncoder class. See paper here.
This is related to #1412. Indeed, maybe this works well without Gamma-Poisson matrix factorization. For instance, we could use
decomposition.LDA
, which we already have.The text was updated successfully, but these errors were encountered: