You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to visualization attention_weight of decoder moudle, I take the output of multihead_attn in the last layer of decoder ,but the shape is that(bs,360,36hw) where h*w is the shape of feature map, I don't understand that there are 36 different attention_weight with the same instances of the same frame as the picture show
Can you explain what this means
The text was updated successfully, but these errors were encountered:
Hi @sally1913105, we compute the spatial and temporal attention, then for 36 frames sequence there are 36 attention weights for each prediction, even the prediction is for a specific frame. In this way, the features from other frames could help the segmentation of this frame.
Thank you for your answer! Can I think of it as within the 36 attention weights of ith prediction only ith attention weights is for ith features and others attention weights is for other features? but How to combine these 36 attention weights?
hi @sally1913105, for each prediction we only use the attention weights of the corresponding frame in this stage. The weights do not need to be combined. Interaction with other frames is realized by the following 3D convolutions.
I want to visualization attention_weight of decoder moudle, I take the output of multihead_attn in the last layer of decoder ,but the shape is that(bs,360,36hw) where h*w is the shape of feature map, I don't understand that there are 36 different attention_weight with the same instances of the same frame as the picture show
Can you explain what this means
The text was updated successfully, but these errors were encountered: