question about cross-attention map #46

llllly26 · 2024-10-29T13:42:52Z

Hi, I find that the cross-attention map is calculated between text and latent image representation. Why is the visualization between text and image pixels? (I find that the code in p2p is also like this)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about cross-attention map #46

question about cross-attention map #46

llllly26 commented Oct 29, 2024

question about cross-attention map #46

question about cross-attention map #46

Comments

llllly26 commented Oct 29, 2024