Question Regarding the Difference Between Zero Tensors and Black Image Embeds in CLIP Vision Model

Thank you for your excellent work on this project! I’ve come across some discussions about unconditional embeddings, but I have a specific question regarding their implementation.

What is the difference between setting CLIP image embeddings to zero tensors directly and passing a black image(or a noised one) to the CLIP vision model? More specifically, what semantic meaning (if any) is encoded in the zeroed image embeddings?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question Regarding the Difference Between Zero Tensors and Black Image Embeds in CLIP Vision Model #463

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question Regarding the Difference Between Zero Tensors and Black Image Embeds in CLIP Vision Model #463

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions