Add Convolutional vision Transformer (CvT) #2176

fffffgggg54 · 2024-05-14T01:07:36Z

CvT as described in https://arxiv.org/abs/2103.15808

Swin-era heirarchical transformer. From-scratch reimplementation, cleaner than original that exposes most module cfgs as kwargs, uses sdpa/timm style (https://github.com/microsoft/CvT/tree/main). WIP/barebones test for now, stuck at successful weight remap but incorrect activations that seem to come, at least in part, from BatchNorm layers.

HuggingFaceDocBuilderDev · 2024-05-14T01:09:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

fffffgggg54 · 2024-05-20T20:52:15Z

Validation for cvt-13 lines up with paper (81.678 top-1 for me), acts are off from reference impl by minute amounts (MSE of logits for 1 sample off on the order of 1e-10). Initial problems had to do with norm before attn and attn residual when there is a cls_token. I'll finish this off later today most likely.

* Update cvt.py * Update cvt.py

fffffgggg54 added 13 commits December 27, 2023 07:57

wip

5f19928

Update cvt.py

2df705a

Update cvt.py

c737410

Update cvt.py

43e363e

Update cvt.py

3b89b47

Update cvt.py

396e8a5

Update cvt.py

8e6c567

Merge branch 'huggingface:main' into cvt

1f0cf09

wip

63532c2

Update cvt.py

c7120f6

Update cvt.py

a1c4c1e

Update cvt.py

7a33ff4

Merge branch 'huggingface:main' into cvt

c63ee94

fffffgggg54 added 11 commits May 20, 2024 10:02

Update cvt.py

1cdedea

Update cvt.py

187208f

Update cvt.py

0aadb30

Update cvt.py

b06907b

Update cvt.py

832c155

oh xd i feel stupid

e3e3b3f

Update cvt.py

6c896b1

Update cvt.py

df05c0d

Update cvt.py

025d8a4

Update cvt.py

183a5da

remove probes

186dab3

fffffgggg54 added 4 commits May 20, 2024 15:00

Update cvt.py

e69b906

Merge branch 'huggingface:main' into cvt

7ba93ae

Cvt 1 (#14)

efa1a36

* Update cvt.py * Update cvt.py

Update cvt.py

e022a47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Convolutional vision Transformer (CvT) #2176

Add Convolutional vision Transformer (CvT) #2176

fffffgggg54 commented May 14, 2024

HuggingFaceDocBuilderDev commented May 14, 2024

fffffgggg54 commented May 20, 2024

Add Convolutional vision Transformer (CvT) #2176

Are you sure you want to change the base?

Add Convolutional vision Transformer (CvT) #2176

Conversation

fffffgggg54 commented May 14, 2024

HuggingFaceDocBuilderDev commented May 14, 2024

fffffgggg54 commented May 20, 2024