Perceiver

Dec 22, 2021

6c6e093 · Dec 22, 2021

This branch is 291 commits behind NielsRogge/Transformers-Tutorials:master.

Name	Name	Last commit message	Last commit date
parent directory ..
Fine_tune_Perceiver_for_text_classification.ipynb	Fine_tune_Perceiver_for_text_classification.ipynb	Gemaakt met Colaboratory	Dec 8, 2021
Fine_tune_the_Perceiver_for_image_classification.ipynb	Fine_tune_the_Perceiver_for_image_classification.ipynb	Use Image feature	Dec 22, 2021
Perceiver_for_Multimodal_Autoencoding.ipynb	Perceiver_for_Multimodal_Autoencoding.ipynb	Remove einops	Dec 8, 2021
Perceiver_for_Optical_Flow.ipynb	Perceiver_for_Optical_Flow.ipynb	Gemaakt met Colaboratory	Dec 8, 2021
Perceiver_for_masked_language_modeling_and_image_classification.ipynb	Perceiver_for_masked_language_modeling_and_image_classification.ipynb	Improve notebook	Dec 11, 2021
README.md	README.md	Update README.md	Dec 11, 2021

README.md

Perceiver IO notebooks

In this directory, you can find several notebooks that illustrate how to use Deepmind's Perceiver IO both for fine-tuning on custom data as well as inference. They are based on the official Colab notebooks released by Deepmind, as well as some additional notebooks which I believe will be helpful for the community.

The notebooks which are available are:

showcasing masked language modeling and image classification with the Perceiver
fine-tuning the Perceiver for image classification
fine-tuning the Perceiver for text classification
predicting optical flow between a pair of images with PerceiverForOpticalFlow
auto-encoding a video (images, audio, labels) with PerceiverForMultimodalAutoencoding

Note that these are just a few examples of what you can do with the Perceiver. There are many more possibilities with it, such as question-answering, named-entity recognition on text, object detection on images, audio classification,... Basically, anything you can do with BERT/ViT/Wav2Vec2/DETR/etc., you can do with the Perceiver too.

The Perceiver and its follow-up variant, Perceiver IO by Google Deepmind are one of my favorite works of 2021.

This model is quite elegant: it aims to solve the quadratic complexity of the self-attention mechanism by employing it on a (not-too large) set of latent variables, rather than on the inputs. The inputs are only used for doing cross-attention with the latents. In that way, the inputs (which can be text, image, audio, video,...) don't have an impact on the memory and compute requirements of the self-attention operations.

In the Perceiver IO paper, the authors extend this to let the Perceiver also handle arbitrary outputs, next to arbitrary inputs. The idea is similar: one only employs the outputs for doing cross-attention with the latents.

The authors show that the model can achieve great results on a variety of modalities, including masked language modeling, image classification, optical flow, multimodal autoencoding and games.

The difference between the various models lies in their preprocessor, decoder and optional postprocessor. I've implemented all models that Deepmind open-sourced (originally written in JAX/Haiku) in PyTorch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

Perceiver

Perceiver

README.md

Perceiver IO notebooks

Files

Perceiver

Directory actions

More options

Directory actions

More options

Latest commit

History

Perceiver

Folders and files

parent directory

README.md

Perceiver IO notebooks