-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Analyzer] Topic Modeling #131
Comments
@lalitpagaria you can assign me this and #130 |
@shahrukhx01 Shahrukh pls confirm if you've started anything on this yet ? if not I would like to collaborate . |
Hi Akarsh, I’ve not started anything yet. However before starting I’d suggest you to go through this: |
thanks @shahrukhx01 , I went through the sentence_transformers library and models for clustering and have some idea myself. |
@akar5h Actually main idea to perform do it on both short (reviews) and large text (news articles, emails etc) as well. But yes we could first start on small texts. This how I see it from user point of view -
This would be helpful for user to perform per-processing and gain taste about what all texts are talking about. |
Just adding to Lalit's comment, when I was adding issue, my intention was to have a comparable pipeline to the zero-shot classifier, the end goal is to categorize data into categories/clusters with user-defined categories without any fine-tuning/training. |
Yes @shahrukhx01 this is important and very helpful to user who doesn't have resource to run Obsei on GPUs. Even few Obsei users are asking it as well. So this will another classification analyzer. |
@shahrukhx01 @lalitpagaria
Also, I feel |
Great @akar5h. Looking for ward for it :) |
Adding one more https://github.com/ddangelov/Top2Vec |
Hi Lalit @lalitpagaria This topic analyzer takes in unlabelled texts as input and clusters them . It inherits from BaseAnalyzer so , has analyze_input as its base function, So the above cluster and their representation of cluster is fine output format or you expect these calculated labels in some other format ? You can see this implemented here: Will create a PR when I complete a non deep learning based approach : "LDA" integrated to this analyzer by end of the week . |
Thanks a lot @akar5h |
@akar5h I just had cursory looks and it looks fine to me. There are few code structure related things which we can take is forward on PR review. @shahrukhx01 can you please have a look as you have more context in this field. |
@lalitpagaria sure I will take a look at it over the weekend. |
@akar5h could you please create PR so we can discuss on it. |
@lalitpagaria , Will do surely , running on low bandwidth with office this week |
Hi @lalitpagaria @akar5h I'm really sorry. I have been super consumed on couple of other things lately, I will try giving my input on this within this week. |
@akar5h @shahrukhx01 No issue please take your time, no urgency :) |
https://github.com/MaartenGr/BERTopic
The text was updated successfully, but these errors were encountered: