Skip to content

Commit

Permalink
udpate doc
Browse files Browse the repository at this point in the history
  • Loading branch information
tanganke committed May 15, 2024
1 parent 33118c0 commit 50e32c3
Show file tree
Hide file tree
Showing 6 changed files with 82 additions and 189 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
/outputs/
outputs/
/.vscode/

# Byte-compiled / optimized / DLL files
Expand Down
2 changes: 1 addition & 1 deletion docs/algorithms/adamerging.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,4 @@ where the merging coefficient $\lambda^{l}_{i}$ and task vector $\tau^{l}_{i}$ a

By leveraging this adaptive learning approach, AdaMerging significantly enhances the model's ability to generalize across tasks and layers, resulting in a more robust and finely-tuned performance profile. The method’s reliance on entropy minimization ensures that the merging process continually seeks the most informative and stable configuration, adapting to the specific needs of the dataset and tasks at hand.

[^1]: (ICLR 2024) AdaMerging: Adaptive Model Merging for Multi-Task Learning. http://arxiv.org/abs/2310.02575
[^1]: (ICLR 2024) AdaMerging: Adaptive Model Merging for Multi-Task Learning. https://openreview.net/pdf?id=nZP6NgD3QY
80 changes: 80 additions & 0 deletions docs/modelpool/clip_vit.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,83 @@
# CLIP-ViT Models for Open Vocabulary Image Classification

Here we provides a list of CLIP-ViT models that are trained for open vocabulary image classification.

## The Eight Tasks

The most common eight tasks used in the research community are SUN397, Cars, RESISC45, EuroSAT, SVHN, GTSRB, MNIST, and DTD.
These tasks cover a wide range of domains, including natural images, satellite images, and digit recognition.
You can download the datasets from [this HuggingFace Collection](https://huggingface.co/collections/tanganke/the-eight-image-classification-tasks-6644ce0376c0a469f6928507) or using the `datasets` library as follows:

```python
from datasets import load_dataset

# take `gtsrb` as an example
dataset = load_dataset("tanganke/gtsrb")

train_dataset = dataset["train"]
test_dataset = dataset["test"]
```

The authors of Task Arithmetic have fine-tuned the CLIP-ViT models from the *open_clip* library on these eight tasks and provide the models publicly on [Google Drive](https://drive.google.com/drive/folders/1u_Tva6x0p6oxu5Eo0ZZsf-520Cc_3MKw?usp=share_link).
However, these models rely on a specific version of the *open_clip* library.

To make experiments more convenient and avoid dependency on a specific library version, we have re-trained these models and made them publicly available on the HuggingFace Model Hub.
We use the Adam Optimizer with a fixed learning rate of 1e-5 over 4000 training steps (batch_size=32).
Only the vision encoder is fine-tuned, while the text encoder remains fixed to preserve the open-vocabulary property of the model.

- [fine-tuned CLIP-ViT-B/32 models](https://huggingface.co/collections/tanganke/clip-vit-b-32-on-the-eight-image-classication-tasks-6644d0c476c0a469f693cf91)
- [fine-tuned CLIP-ViT-L/14 models](https://huggingface.co/collections/tanganke/clip-vit-l-14-on-the-eight-image-classification-tasks-6644d2b014331c746683de63)

To use these models, you can load them from the Transformers library as follows:

load vision backbone

```python
from transformers import CLIPVisionModel

# load the CLIP-ViT-B/32 model, take `gtsrb` as an example
vision_model = CLIPVisionModel.from_pretrained('tanganke/clip-vit-base-patch32_gtsrb')
```

substitute the vision encoder of clip

```python
from transformers import CLIPProcessor, CLIPModel

# load pre-trained CLIP model
clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
# substitute the vision model with the fine-tuned one
clip_model.vision_model.load_state_dict(vision_model.vision_model.state_dict())
```

### Use Cases

To use these models from our FusionBench library, you can specify the modelpool configuration file as follows:

```yaml title="config/modelpool/clip-vit-base-patch32_TA8.yaml"
type: huggingface_clip_vision
models:
- name: _pretrained_
path: openai/clip-vit-base-patch32
- name: sun397
path: tanganke/clip-vit-base-patch32_sun397
- name: stanford_cars
path: tanganke/clip-vit-base-patch32_stanford-cars
- name: resisc45
path: tanganke/clip-vit-base-patch32_resisc45
- name: eurosat
path: tanganke/clip-vit-base-patch32_eurosat
- name: svhn
path: tanganke/clip-vit-base-patch32_svhn
- name: gtsrb
path: tanganke/clip-vit-base-patch32_gtsrb
- name: mnist
path: tanganke/clip-vit-base-patch32_mnist
- name: dtd
path: tanganke/clip-vit-base-patch32_dtd
```
The type of the modelpool is `huggingface_clip_vision`, corresponding to the modelpool class `HuggingFaceClipVisionPool`.

::: fusion_bench.modelpool.HuggingFaceClipVisionPool

27 changes: 0 additions & 27 deletions fusion_bench/outputs/cli/2024-05-15_16-42-06/.hydra/config.yaml

This file was deleted.

159 changes: 0 additions & 159 deletions fusion_bench/outputs/cli/2024-05-15_16-42-06/.hydra/hydra.yaml

This file was deleted.

This file was deleted.

0 comments on commit 50e32c3

Please sign in to comment.