GitHub - zer0int/vit-visualization: vit-visualization with large CLIP ViT models

This: Include large CLIP ViT models

Available models:

94: CLIP0_RN50x4
95: CLIP1_RN50x16
96: CLIP2_ViT-B/32
97: CLIP3_ViT-B/16
98: CLIP4_ViT-L/14
99: CLIP5_ViT-L/14@336px

/experiments/it15/runvis98.py to visualize all or [range] of features for a given layer

Take a holiday and leave it running; literally takes days to visualize all features even on an RTX 4090.
Requires <10 GB VRAM even for a CLIP5_ViT-L/14@336px, though!

For example: CLIP5_ViT-L/14@336px Layer 20: Feature F1, F223, F670, F672

Originally implemented CLIP models, see below for original README.md:

What do Vision Transformers Learn? A Visual Exploration

ViT Models:

To visualize the features of the ViT models:

PYTHONPATH=. python experiments/it15/vis35.py -l <layer_number> -f <feature_number> -n  <network_number> -v <tv_coefficient>

For example:

PYTHONPATH=. python experiments/it15/vis35.py -l 4 -f 20 -n  35 -v 0.1

Clip Models:

To visualize the features of the CLIP models:

PYTHONPATH=. python experiments/it15/vis98.py -l <layer_number> -f <feature_number> -n  <network_number> -v <tv_coefficient>

For example:

PYTHONPATH=. python experiments/it15/vis98.py -l 4 -f 20 -n  98 -v 0.1

For the ViT models the -n option should be in [34, 35, 36, 37, 38, 39], and for the CLIP models the -n option should be in [94, 95, 96, 97, 98, 99]

To list all the available network numbers use:

python show_models.py

Here we list some of them:

34: ViT0_B_16_imagenet1k
35: ViT1_B_32_imagenet1k
36: ViT2_L_16_imagenet1k
37: ViT3_L_32_imagenet1k
38: ViT4_B_16
39: ViT5_B_32
94: CLIP0_RN50
95: CLIP1_RN101
96: CLIP2_RN50x4
97: CLIP3_RN50x16
98: CLIP4_ViT-B/32
99: CLIP5_ViT-B/16

We use the timm library to load the pretrained models. After running these commands, you can find the visualizations in the desktop folder.

Other experiments done in the paper can be found in the experiments folder.

For the experiments that we need to load the imagenet dataset like the isolating CLS experiment, the code assumes that the dataset is in data/imagenet/train for the training set, and data/imagenet/val for the validation set.

We will update the readme with more instructions on how to run other experiments soon.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
augmentation		augmentation
cam		cam
datasets		datasets
experiments		experiments
hooks		hooks
inversion		inversion
loss		loss
model		model
readme_images		readme_images
saliency_tools		saliency_tools
saver		saver
utils		utils
.gitignore		.gitignore
README.md		README.md
args.py		args.py
show_models.py		show_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This: Include large CLIP ViT models

/experiments/it15/runvis98.py to visualize all or [range] of features for a given layer

For example: CLIP5_ViT-L/14@336px Layer 20: Feature F1, F223, F670, F672

What do Vision Transformers Learn? A Visual Exploration

ViT Models:

Clip Models:

About

Releases

Packages

Languages

zer0int/vit-visualization

Folders and files

Latest commit

History

Repository files navigation

This: Include large CLIP ViT models

/experiments/it15/runvis98.py to visualize all or [range] of features for a given layer

For example: CLIP5_ViT-L/14@336px Layer 20: Feature F1, F223, F670, F672

What do Vision Transformers Learn? A Visual Exploration

ViT Models:

Clip Models:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages