Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix KeyedVectors.add_vectors() error when use most_similar #3320

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

s2terminal
Copy link

Fixing one of the issues in #3224

If I call most_similar() before doing add_vectors() and then call most_similar() again after doing add_vectors(), I get a ValueError: operands could not be broadcast together with shapes.
This error occurs because len(vectors) and len(vectors.norms) do not match.

from gensim.models import Word2Vec
import numpy

model = Word2Vec(sentences=[
                            ["this", "is", "test1"],
                            ["that", "is", "test2"],
], vector_size=2, min_count=1)

print(model.wv.most_similar("test1", topn=1)) #=> [('test2', 0.9941185712814331)]

model.wv.add_vectors(["test3"], [numpy.array([0.5, 0.5])])

print(model.wv.most_similar("test1", topn=1)) #=> ValueError: operands could not be broadcast together with shapes (6,) (5,) 

To resolve this error, I have used fill_norms to match len(vectors) and len(vectors.norms).

@mpenkov
Copy link
Collaborator

mpenkov commented Aug 23, 2023

Needs a test. The example from the issue description is probably good enough.

@mpenkov mpenkov added this to the Spring 2024 release milestone Apr 8, 2024
@mpenkov mpenkov removed this from the Summer 2024 release milestone Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants