-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
weights_logits:4 the MultiOutSizeLinear.forward out is alwayls zero #44
Comments
self.out_features_ls should be [8, 16, 32, 64, 128] based on the current hyperparameters. Not too sure what is the weights_logits that you are referring to. out_feat_size is a tensor representing the patch size for each token. torch.eq(...) behaves as a mask, and only adds the current feat_size to out. So, out should be the prediction of each token based on the appropriate patch size with zero padding. |
in DistrParamProj.init function: print(args_dim) the code
when dim is 4(weights_logits param), the tuple will be [32, 64, 128, 256, 512].
|
I see.. I think I get what you mean, will look into it, thanks! |
Seems like this is a pretty major bug, fixing it would make predictions with patch size 8, 16 (with the current configuration) have better outputs, and improve performance for low frequency data. Thanks for catching this! |
MultiOutSizeLinear.forward
self.out_features_ls is [32, 64, 128, 256, 512]
because weights_logits:4 * [8, 16, 32, 64, 256]
when out_feat_size is 8,
the "torch.eq(out_feat_size, feat_size).unsqueeze(-1) " is always is False
then the out is alwayls zero. is it right?
The text was updated successfully, but these errors were encountered: