Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the volume of the learned word2box have a negative value? #1

Open
ohashi3399 opened this issue Feb 2, 2023 · 3 comments
Open

Comments

@ohashi3399
Copy link

ohashi3399 commented Feb 2, 2023

Hi, I appreciate for sharing an implementation of word2box including training code.

Since this question is probably out of your focus, It would be glad to tell me advises if you know.

I trained word2box model with Japanese wikipedia corpus with modifying some parts of codes (mainly just replacing "model_eval" func from English tasks to Japanese tasks) and confirmed spearman's R showed around 0.65 in some tasks.

As a next step, I checked some volumes of words using my trained word2box model. I used get_volumes function of BoxEmbedding class of modules.py in your code.
Then some of words had minus values, which is range from 113.8046 to -376.4449.

I think a value of minus volume is not intuitive. I'm being suspicious of any mistakes made during my training code.
Is it possible the volume has minus value?

It would be glad to tell advises if you know.
Thanks.

@ohashi3399 ohashi3399 changed the title Softplus based volume shows minus values. Does the volume of the learned word2box have a negative value? Feb 2, 2023
@illuminascent
Copy link

Also attempted training on Japanese wikipedia. Can confirm on negative volume and also negative similarity scores on supposedly "similar" meaning words.
After inspecting the implementation I found that there are no constraints on preventing z > Z from happening, if you use BoxTensor type this actually happens a lot.
Also the intersection algorithm does not consider cases where 2 box does not intersect at all, which also causes z > Z in one or more dimensions in the resulting BoxEmbedding and hence a negative score(volume).
The author seems to prefer DeltaBoxTensor type which explicitly formulates Z = z + softplus(delta), however after training I also found many negative values in delta, which in turn makes Z ~= z and makes a lot of box dimensions collapse completely.
Would attempt adding penalty term for z > Z in loss function, in the meantime I wish the author can clarify some of these concerns.

@ohashi3399
Copy link
Author

It's very helpful share and insights. Indeed, there's no constraints on preventing z > Z. I would like to list up the plan for improvements while waiting author's answer. Thanks.

@illuminascent
Copy link

After examining all codes and reading this paper which the author referenced, I think a negative volume score is by design and not a bug. As the volume is implemented as log( SUM_d( expected side length in dimension d] ) ), and all Gumbel boxes are supposed to live on the confined support [0,1]^d (and also intialized like so), the maximum volume a box can have is 1 which in turn makes the log has a maxima of 0. The fact that some of your volume is positive may suggest some bounds moved way out of its supposed support, which the author did not constrain them from doing.

Also I think the reason why a lot of the DeltaBoxTensors end up having negative value is because the author used "init_small" when initializing BoxEmbeddings, e.g. the initialized boxes are randomly located in [0,1]^d (z~U[0,1]), but each side has fixed delta Z=0.0541. This value is very close to 0 and it shouldn't be a surprise that after some unconstrained updates some of the Zs become negative. A clarification from the author will be very helpful as there are also other initalization methods that are implemented but not used. In any case, changing Z to a bigger value than 0.0541 may help prevent negatives, and leave more dimensions uncollapsed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants