[BUG] Bottleneck adapters do not work with the ViT model when `original_ln_after = False` #764

julian-fong · 2024-12-02T16:46:28Z

It seems like the ViT model does not train well with the bottleneck configs when the parameter original_ln_after is set to False

To reproduce

from datasets import load_dataset
import torch
num_classes = 100
train_dataset = load_dataset("uoft-cs/cifar100", split = "train").select(range(10000))
eval_dataset = load_dataset("uoft-cs/cifar100", split = "test").select(range(1000))

train_dataset.set_format("torch")
eval_dataset.set_format("torch")

model_name_or_path = 'google/vit-base-patch16-224-in21k'

from transformers import ViTImageProcessor
processor = ViTImageProcessor.from_pretrained(model_name_or_path)

def preprocess_image(example):
  image = processor(example["img"], return_tensors='pt')
  image["label"] = example["fine_label"]
  return image


train_dataset = train_dataset.map(preprocess_image)
eval_dataset = eval_dataset.map(preprocess_image)
#remove uneccessary columns
train_dataset = train_dataset.remove_columns(['img', 'fine_label', 'coarse_label'])
eval_dataset = eval_dataset.remove_columns(['img', 'fine_label', 'coarse_label'])


from typing import Any
from dataclasses import dataclass

@dataclass
class DataCollator:
  processor : Any
  def __call__(self, inputs):

    pixel_values = [input["pixel_values"].squeeze() for input in inputs]
    labels = [input["label"] for input in inputs]

    pixel_values = torch.stack(pixel_values)
    labels = torch.stack(labels)
    return {
        'pixel_values': pixel_values,
        'labels': labels,
    }

data_collator = DataCollator(processor = processor)


from adapters import ViTAdapterModel

model = ViTAdapterModel.from_pretrained(model_name_or_path)

from adapters import BnConfig
config = BnConfig(mh_adapter=False, output_adapter=True, reduction_factor=96, non_linearity="relu", original_ln_after=False)
model.add_adapter("bottleneck_adapter", config=config)
model.add_image_classification_head("bottleneck_adapter", num_labels=num_classes)
model.train_adapter("bottleneck_adapter")

import numpy as np
import evaluate
accuracy = evaluate.load("accuracy")

def compute_metrics(p):
  return accuracy.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)

from adapters import AdapterTrainer
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir='./training_results',
    eval_strategy='epoch',
    learning_rate=10e-3,
    per_device_train_batch_size=64,
    per_device_eval_batch_size=64,
    num_train_epochs=5,
    weight_decay=10e-4,
    report_to = "none",
    remove_unused_columns=False,
)

trainer = AdapterTrainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=processor,
    compute_metrics = compute_metrics
)

trainer.train()

The text was updated successfully, but these errors were encountered:

calpt · 2025-01-02T15:21:39Z

I did some experimentation based on the script you provided with varying configs:

In general, at least one of original_ln_before or original_ln_after should be set to True to make sure the original residual connection from pre-training is preserved.
When original_ln_after=False, training only seems to converge if residual_before_ln=False, so these two should be used in combination in the example provided.

Since training does work with certain combinations of config values, I don't belive there's a general issue in the implementation here, just that we need to make sure to select the right combination of values. (Maybe we could add these notes as tips to a suitable place in the notebooks/ docs)

(edit: replaced results image)

julian-fong · 2025-01-02T21:02:10Z

I can provide some updates inside #775 since we are already planning to fix the config. Do you think this is a suitable place to put these notes?

calpt · 2025-01-04T15:44:07Z

sounds good. things specific to AdapterPlus are good in the place you linked. The things on layer norm would be general to bottleneck adapters, so we could add them in this section of the docs in one of those blue "Note" boxes to make it better discoverable.

julian-fong added the bug Something isn't working label Dec 2, 2024

calpt added question Further information is requested and removed bug Something isn't working labels Jan 2, 2025

calpt closed this as completed Jan 2, 2025

julian-fong mentioned this issue Jan 6, 2025

[BUG] Fix AdapterPlus config #775

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Bottleneck adapters do not work with the ViT model when `original_ln_after = False` #764

[BUG] Bottleneck adapters do not work with the ViT model when `original_ln_after = False` #764

julian-fong commented Dec 2, 2024

calpt commented Jan 2, 2025 •

edited

Loading

julian-fong commented Jan 2, 2025

calpt commented Jan 4, 2025

[BUG] Bottleneck adapters do not work with the ViT model when original_ln_after = False #764

[BUG] Bottleneck adapters do not work with the ViT model when original_ln_after = False #764

Comments

julian-fong commented Dec 2, 2024

To reproduce

calpt commented Jan 2, 2025 • edited Loading

julian-fong commented Jan 2, 2025

calpt commented Jan 4, 2025

[BUG] Bottleneck adapters do not work with the ViT model when `original_ln_after = False` #764

[BUG] Bottleneck adapters do not work with the ViT model when `original_ln_after = False` #764

calpt commented Jan 2, 2025 •

edited

Loading