-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documents disappearing when using get_sparse_theta #976
Comments
@r0mainK thanks for reporting this and for linking to more details in your repository. I'll put out our list to implement additional tests for sparse retrieval of theta and phi, and to check consistency with retrieval of the dense matrix. |
Update I came across a related bug recently. I had not check the sanity of the theta matrices when retrieving them, it seems this fix actually just created null documents rows. However, when retrieving the theta matrix with I also noticed that the bug seemed to appear when inducing sparsity via the For more information you can check this issue Cheers. EDIT: forgot to mention this earlier, but we are using the latest tagged version (0.10.0), built following your guide in a docker instance based off |
Hey !
So I've been using the
ARTM
model via the python API to do some topic modeling, and ran into the following bug: after training offline the model for a couple iterations, I often saw documents disappear from thetheta
, when retrieving it via theget_sparse_theta
method. The documents in question were the same at each run (for the same seed) and had very low word counts.Furthermore, I saw that the number would sometimes increase after decreasing, implying the data was still there, but no being returned. I was able to get rid of this problem by retrieving the dense matrix directly, by storing it in a
phi
matrix by providing thetheta_name
argument to the model's constructor. As this workaround solved the issue for me, I will not be looking further into this, but I thought you might want to know. There are more details in our repo's tracking issue if you want to check it out - the jist of it is that there is almost certainly a problem when retrieving data as sparse matrix.Anyway, cheers
The text was updated successfully, but these errors were encountered: