Skip to content

Question Regarding the Difference Between Zero Tensors and Black Image Embeds in CLIP Vision Model #463

@jiagongfu

Description

@jiagongfu

Thank you for your excellent work on this project! I’ve come across some discussions about unconditional embeddings, but I have a specific question regarding their implementation.

What is the difference between setting CLIP image embeddings to zero tensors directly and passing a black image(or a noised one) to the CLIP vision model? More specifically, what semantic meaning (if any) is encoded in the zeroed image embeddings?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions