Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce the GAT v1 attention matrix #5

Open
ALEX13679173326 opened this issue May 12, 2022 · 3 comments
Open

Reproduce the GAT v1 attention matrix #5

ALEX13679173326 opened this issue May 12, 2022 · 3 comments

Comments

@ALEX13679173326
Copy link

Thanks for your great contribution!!
I'm confused about Figure 1 (a) in your paper. Which layer of GAT is this attention matrix in? Is the attention matrix of all layers the same? Is the attention matrix between different heads in one layer like this?

Best regards

@shakedbr
Copy link
Collaborator

Hi @ALEX13679173326 !
Thank you for your interest in our work!

This is one of the heads of a single layer of GAT/GATv2, trained on the DictionaryLookup problem (Figure 2).
Regarding different layers - this problem can be solved using a single layer so we trained only a single layer, but the same pattern will appear for all multiple layers (possibly with a different argmax key), because GAT simply cannot express any other pattern.
Regarding different heads - the figure visualizes just one head, but all other heads exhibit the same pattern - again because GAT cannot express any other pattern.

Does that answer your questions? Feel free to let us know if anything is unclear.

@ALEX13679173326
Copy link
Author

Thanks very much for your reply!!

Recently, I find there is the same pattern in the attention matrix in ViT(Vision Transformer), which also uses self-attention mechanism. If we regard ViT as a graph model, I think this phenomenon may have connection with GAT.
So, can I use the code in this repository to generate the result in Figure 1(a)? If not, can you release related codes?

In my immature opinion, this phenomenon in Figure 1(a) may be related to some potential weakness of the self-attention mechanism. Have you researched the cause of this phenomenon?

Thanks again!

@shakedbr
Copy link
Collaborator

Our main analysis is on the GAT formulation.
In the appendix of our paper, you can find an additional analysis on dot product attention (e.g. Transformers).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants