-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnecessary dependency on FuzzyTM pulls in many libraries #3423
Labels
bug
Issue described a bug
difficulty easy
Easy issue: required small fix
impact HIGH
Show-stopper for affected users
reach HIGH
Affects most or all Gensim users
Comments
Thanks for reporting! @mpenkov Is fuzzyTM really a hard dependency? If so that's terrible, definitely an omission / bug (or if intentional, done in very bad taste). Let's release a bug fix ASAP. |
piskvorky
added
bug
Issue described a bug
difficulty easy
Easy issue: required small fix
impact HIGH
Show-stopper for affected users
reach HIGH
Affects most or all Gensim users
labels
Jan 9, 2023
I'm surprised this line is still there, it was part of my first PR. The dependency can be removed from |
This was referenced Jan 9, 2023
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Issue described a bug
difficulty easy
Easy issue: required small fix
impact HIGH
Show-stopper for affected users
reach HIGH
Affects most or all Gensim users
Problem description
I'm trying to upgrade to the new Gensim 4.3.0 release. My colleague @juhoinkinen noticed in NatLibFi/Annif#660 that Gensim 4.3.0 pulls in more dependencies than the previous release 4.2.0, including pandas. I suspect that at least the FuzzyTM dependency (which in turn pulls in pandas) is actually unused and thus unnecessary.
Steps/code/corpus to reproduce
Installing Gensim 4.2.0 into an empty venv (only four packages installed):
Installing Gensim 4.3.0 into an empty venv (18 packages installed):
The size of the venv has grown from 249MB to 318MB, an increase of 69MB.
Here is what
pipdeptree
shows - FuzzyTM appears to be the main reason why so many libraries are pulled in:It appears that the FuzzyTM dependency was added in PR #3398 (Flsamodel) by @ERijck . The first commits in this PR depended on the library, but a subsequent commit 9fec00b reworked the code so it doesn't need to import FuzzyTM at all. But the dependency in setup.py wasn't actually removed, it's still there: https://github.com/RaRe-Technologies/gensim/blob/f35faae7a7b0c3c8586fb61208560522e37e0e7e/setup.py#L347
I think the FuzzyTM dependency could be safely dropped, as the library is not actually imported. It would reduce the number of libraries Gensim pulls in and thus reduce the size of installations, including Docker images where minimal size is often required.
Versions
I'm using Ubuntu Linux 22.04.
Linux-5.15.0-56-generic-x86_64-with-glibc2.35
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
Bits 64
NumPy 1.24.1
SciPy 1.10.0
gensim 4.3.0
FAST_VERSION 0
The text was updated successfully, but these errors were encountered: