Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about B-TA #9

Open
Qiublack opened this issue Apr 8, 2023 · 6 comments
Open

about B-TA #9

Qiublack opened this issue Apr 8, 2023 · 6 comments

Comments

@Qiublack
Copy link

Qiublack commented Apr 8, 2023

Thank you for your excellent work. I have some confusion about B-TA, the background map generated in the text using subtraction is multiplied with Q and K. So that the object in the background map is 0 and the object in QK is 1. Wouldn't the multiplication make the generated feature map all 0?

@Lwt-diamond
Copy link

Lwt-diamond commented Apr 8, 2023

Hello, in section 3.2 of the paper, introducing Mask Generation, the author says that elements in mask are in the range of 0 to 1. I wonder if my answer will answer your question.

@Qiublack
Copy link
Author

Qiublack commented Apr 8, 2023

Hello, in section 3.2 of the paper, introducing Mask Generation, the author says that elements in mask are in the range of 0 to 1. I wonder if my answer will answer your question. (In Mask Generation, the author says to use 3x3 convolution to generate masks, but I don't seem to see it in the code)

The point of confusion for me is this, the object in the background mask is 0 and the background is 1, which is the opposite of the feature map. Therefore the multiplication will be all zeros and does not serve to enhance the features.

@Lwt-diamond
Copy link

The author uses the self-generated feature graph (E5,D4, D3,D2), and then converts the number of channels from 128 to 1 through a 3x3 convolution. The range of elements in the generated mask map is 0 to 1. You can see that from the code in decoder_p.py.

@Qiublack
Copy link
Author

Qiublack commented Apr 8, 2023

作者使用自生成特征图(E5,D4,D3,D2),然后通过128x1卷积将通道数从3转换为3。生成的掩码映射中的元素范围为 0 到 1。您可以从decoder_p.py中的代码中看到这一点。

I know how the mask is generated. What I want to know is why the background mask can be multiplied with the QK, which doesn't quite make sense.

@SilentWhiteRabbit
Copy link

I feel that your question is valuable. As far as the author's model diagram is concerned, the mask here should be (foreground is 1, background is 0), so QK's matrix multiplication is to only focus on the correlation of pixels in the foreground (if the foreground is 0 and the background is 1, then it is only concerned about the interrelationship of the background, of course, I think the two are about the same)... However, it is important that either the foreground pixels are 0 and the background pixels are 0 in the QK product. This is fatal, because when QK is multiplied by V, there is a situation where either the foreground is all 0 or the background is all 0. The output mask simply calculates the relationship between pixels on top of the foreground of the input mask. For a coarse to fine process is illogical.

@SilentWhiteRabbit
Copy link

我觉得你的问题很有价值。就笔者的模型图而言,这里的掩码应该是(前景为1,背景为0),所以QK的矩阵乘法是只关注前景中像素的相关性(如果前景为0,背景为1,那么它只关心背景的相互关系, 当然,我认为两者差不多)......但是,在 QK 产品中,前景像素为 0,背景像素为 0,这一点很重要。这是致命的,因为当 QK 乘以 V 时,会出现前景全为 0 或背景全为 0 的情况。输出掩码只是计算输入掩码前景顶部像素之间的关系。对于从粗到细的过程是不合逻辑的。

另外,如果说这个可能是逻辑的话。resnet-50的backbone和384^2的分辨率,在cod10k上达到了0.29的结果,这个指标结果完全和其他模型拉开了差距。十分期待全部代码的公开。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants