-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't add vector to pretrained fasttext model via .add_vector #3224
Comments
Hi! Have you found a solution yet? I encounter a same issue with a 4.x-trained (cf #3114) Word2Vec: When adding new vectors:
When updating vectors, Versions
Thanks |
Thanks for reporting. @gojomo any ideas? IIRC you rewrote this part for Gensim 4.0. |
Although all in the same related
While the traceback for that last one specifically implicates some of the pre-4.0 refactorings, the The code clearly needs better test coverage & a deeper look/reork for consistency/correctness, which would probably resolve these (& #3114). I might have some time for this next week. As each of these looks pretty easy to reproduce, anyone wanting to contribute minimal self-contained test cases triggering each of the failures in current code could give any future fixes a running head-start. |
Is this problem still unresolved? I have reproduced the error from gensim.models import Word2Vec
import numpy
model = Word2Vec(sentences=[
["this", "is", "test1"],
["that", "is", "test2"],
], vector_size=2, min_count=1)
print(model.wv.most_similar("test1", topn=1)) #=> [('test2', 0.9941185712814331)]
model.wv.add_vectors(["test3"], [numpy.array([0.5, 0.5])])
print(model.wv.most_similar("test1", topn=1)) #=> ValueError: operands could not be broadcast together with shapes (6,) (5,)
Do you have a solution? Thanks. |
I have found that this from gensim.models import Word2Vec
import numpy
model = Word2Vec(sentences=[
["this", "is", "test1"],
["that", "is", "test2"],
], vector_size=2, min_count=1)
print(model.wv.most_similar("test1", topn=1)) #=> [('test2', 0.9941185712814331)]
model.wv.add_vectors(["test3"], [numpy.array([0.5, 0.5])])
model.wv.fill_norms(force=True) # added
print(model.wv.most_similar("test1", topn=1)) #=> [('test2', 0.9941185712814331)] |
Thanks for the fix! Yes, part of properly finishing/verifying/testing the
The docs for the add methods may also deserve extra warnings about how such expansions to a |
@gojomo Thanks for your comment. Does this mean that I should stop checking with Or should the correct specification be that the |
In general, it's never guaranteed that the norms have been calculated, so any method that requires them calls So I think any method that invalidates anything about the current norms can just discard them, This 'lazy' approach means you can do a bunch of invalidating ops (like sequential adds) in a row, without paying the cost of the full norm-refresh until they're needed again. |
Problem description
I'm trying to add a new vector to a pretrained fasttext model via
.add_vector
. However, it seems like the vector is not added if I check via.has_index_for
.Steps/code/corpus to reproduce
Versions
The text was updated successfully, but these errors were encountered: