Skip to content

zer0int/vit-visualization

 
 

Repository files navigation

This: Include large CLIP ViT models

Available models:

  • 94: CLIP0_RN50x4
  • 95: CLIP1_RN50x16
  • 96: CLIP2_ViT-B/32
  • 97: CLIP3_ViT-B/16
  • 98: CLIP4_ViT-L/14
  • 99: CLIP5_ViT-L/14@336px

/experiments/it15/runvis98.py to visualize all or [range] of features for a given layer

  • Take a holiday and leave it running; literally takes days to visualize all features even on an RTX 4090.
  • Requires <10 GB VRAM even for a CLIP5_ViT-L/14@336px, though!

For example: CLIP5_ViT-L/14@336px Layer 20: Feature F1, F223, F670, F672

0_final 0_final 0_final 0_final

Originally implemented CLIP models, see below for original README.md:


What do Vision Transformers Learn? A Visual Exploration

Visualizations

ViT Models:

To visualize the features of the ViT models:

PYTHONPATH=. python experiments/it15/vis35.py -l <layer_number> -f <feature_number> -n  <network_number> -v <tv_coefficient>

For example:

PYTHONPATH=. python experiments/it15/vis35.py -l 4 -f 20 -n  35 -v 0.1

Clip Models:

To visualize the features of the CLIP models:

PYTHONPATH=. python experiments/it15/vis98.py -l <layer_number> -f <feature_number> -n  <network_number> -v <tv_coefficient>

For example:

PYTHONPATH=. python experiments/it15/vis98.py -l 4 -f 20 -n  98 -v 0.1

For the ViT models the -n option should be in [34, 35, 36, 37, 38, 39], and for the CLIP models the -n option should be in [94, 95, 96, 97, 98, 99]

To list all the available network numbers use:

python show_models.py

Here we list some of them:

  • 34: ViT0_B_16_imagenet1k
  • 35: ViT1_B_32_imagenet1k
  • 36: ViT2_L_16_imagenet1k
  • 37: ViT3_L_32_imagenet1k
  • 38: ViT4_B_16
  • 39: ViT5_B_32
  • 94: CLIP0_RN50
  • 95: CLIP1_RN101
  • 96: CLIP2_RN50x4
  • 97: CLIP3_RN50x16
  • 98: CLIP4_ViT-B/32
  • 99: CLIP5_ViT-B/16

We use the timm library to load the pretrained models. After running these commands, you can find the visualizations in the desktop folder.

Other experiments done in the paper can be found in the experiments folder.

For the experiments that we need to load the imagenet dataset like the isolating CLS experiment, the code assumes that the dataset is in data/imagenet/train for the training set, and data/imagenet/val for the validation set.

We will update the readme with more instructions on how to run other experiments soon.

About

vit-visualization with large CLIP ViT models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 99.0%
  • Python 1.0%