-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about B-TA #9
Comments
Hello, in section 3.2 of the paper, introducing Mask Generation, the author says that elements in mask are in the range of 0 to 1. I wonder if my answer will answer your question. |
The point of confusion for me is this, the object in the background mask is 0 and the background is 1, which is the opposite of the feature map. Therefore the multiplication will be all zeros and does not serve to enhance the features. |
The author uses the self-generated feature graph (E5,D4, D3,D2), and then converts the number of channels from 128 to 1 through a 3x3 convolution. The range of elements in the generated mask map is 0 to 1. You can see that from the code in decoder_p.py. |
I know how the mask is generated. What I want to know is why the background mask can be multiplied with the QK, which doesn't quite make sense. |
I feel that your question is valuable. As far as the author's model diagram is concerned, the mask here should be (foreground is 1, background is 0), so QK's matrix multiplication is to only focus on the correlation of pixels in the foreground (if the foreground is 0 and the background is 1, then it is only concerned about the interrelationship of the background, of course, I think the two are about the same)... However, it is important that either the foreground pixels are 0 and the background pixels are 0 in the QK product. This is fatal, because when QK is multiplied by V, there is a situation where either the foreground is all 0 or the background is all 0. The output mask simply calculates the relationship between pixels on top of the foreground of the input mask. For a coarse to fine process is illogical. |
另外,如果说这个可能是逻辑的话。resnet-50的backbone和384^2的分辨率,在cod10k上达到了0.29的结果,这个指标结果完全和其他模型拉开了差距。十分期待全部代码的公开。 |
Thank you for your excellent work. I have some confusion about B-TA, the background map generated in the text using subtraction is multiplied with Q and K. So that the object in the background map is 0 and the object in QK is 1. Wouldn't the multiplication make the generated feature map all 0?
The text was updated successfully, but these errors were encountered: