-
Notifications
You must be signed in to change notification settings - Fork 29.4k
Update CvT documentation with improved usage examples and additional … #38731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @stevhliu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for your update!
docs/source/en/model_doc/cvt.md
Outdated
# TODO: add information about CvT in simple language like Vit | ||
Convolutional Vision Transformer (CvT) is a model that combines the strengths of convolutional neural networks (CNNs) and transformers for computer vision tasks. It introduces convolutional layers into the transformer architecture, allowing it to capture local patterns in images while maintaining the global context provided by self-attention mechanisms. | ||
You can find all the CvT checkpoints under the [Microsoft](https://huggingface.co/microsoft?search_models=cvt) organization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# TODO: add information about CvT in simple language like Vit | |
Convolutional Vision Transformer (CvT) is a model that combines the strengths of convolutional neural networks (CNNs) and transformers for computer vision tasks. It introduces convolutional layers into the transformer architecture, allowing it to capture local patterns in images while maintaining the global context provided by self-attention mechanisms. | |
You can find all the CvT checkpoints under the [Microsoft](https://huggingface.co/microsoft?search_models=cvt) organization. | |
[Convolutional Vision Transformer (CvT)](https://huggingface.co/papers/2103.15808) is a model that combines the strengths of convolutional neural networks (CNNs) and transformers for computer vision tasks. It introduces convolutional layers into the transformer architecture, allowing it to capture local patterns in images while maintaining the global context provided by self-attention mechanisms. | |
You can find all the original CvT checkpoints under the [Microsoft](https://huggingface.co/microsoft?search_models=cvt) organization. |
- CvT models integrate convolutions into the Vision Transformer architecture, aiming to combine the strengths of both CNNs and Transformers. This can lead to improved performance and efficiency on vision tasks. | ||
- Use [`AutoImageProcessor`] for preprocessing images for CvT models. This typically includes resizing, rescaling, and normalizing the input images to match the model's training configuration. | ||
- The original ViT demo notebooks, such as those found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/VisionTransformer), can often be adapted for CvT. You would typically replace ViT-specific classes like `ViTFeatureExtractor` with `AutoImageProcessor` and `ViTForImageClassification` with `CvtForImageClassification` or `AutoModelForImageClassification` using a CvT checkpoint. | ||
- CvT checkpoints available on the Hugging Face Hub are often pre-trained on large-scale datasets like ImageNet-22k and may also be fine-tuned on datasets like ImageNet-1k. | ||
|
||
## Resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the below and replace with:
Refer to this set of ViT [notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/VisionTransformer) for examples of inference and fine-tuning on custom datasets. Replace [`ViTFeatureExtractor`] and [`ViTForImageClassification`] in these notebooks with [`AutoImageProcessor`] and [`CvtForImageClassification`].
Co-authored-by: Steven Liu <[email protected]>
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! 🤗
…notes
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.