Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yake Stemming? #136

Open
ace-kay-law-neo opened this issue Jul 14, 2020 · 2 comments
Open

Yake Stemming? #136

ace-kay-law-neo opened this issue Jul 14, 2020 · 2 comments

Comments

@ace-kay-law-neo
Copy link

My team uses your yake extractor. We get pretty good results.

We would like to enable stemming. However, after studying the source code the stemming implementation seems to have two major shortcomings.
implementation

The final weight calculation with stemming doesn't consider the term-frequency. In contrast, the plain-word implementation does that. (line 358, 390) Additionally, the word-implementation also differentiates between stopwords and non-stopwords. (line 365-383)

Is it possible to offer the same great possibilities with the stemmed-version? In general, should we go with the stem or non-stem version? What would you suggest?

My team and I are grateful for any help. :)

@janandreschweiger
Copy link

@ygorg Please have a look on this. I can confirm this downside.

@ygorg
Copy link
Collaborator

ygorg commented Jan 8, 2021

Hi,
I tried some things. And I need to ensure the performance are the same as before, so I need to find a way to prove how the updated code is backwards compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants