Multi-modal models and optional inputs #4293

LucaCappelletti94 · 2026-01-08T12:49:28Z

LucaCappelletti94
Jan 8, 2026

Hi,

I have recently started to look into burn as a viable high-level crate to use in my lab instead of writing our own kernels, but I might be having some vocabulary difficulties in finding the aforementioned basic features in the documentation of the library.

Most datasets we work on are characterized by having variable features for variable datapoints (e.g. some have images, some have text, some have spectral properties, some have any combination of the above). We use the rather trivial approach to dispatch any feature set to the appropriate input submodule and then use a concatenation/dense layer on top, avoiding any computation/backprop for the unuset modules.

How are multi-modality & optional input features handled in burn? Are there examples somewhere?

Best,
Luca

laggui · 2026-01-09T13:41:00Z

laggui
Jan 9, 2026
Maintainer

Hey!

There is no official example yet on modeling multiple modalities, but your data points (i.e., dataset item which are combined into batches) and module inputs can simply be represented with an Option if they are optional.

An example module:

 #[derive(Module, Debug)]
pub struct MultiModalModel<B: Backend> {
    optional_1: Option<Linear<B>>,
    optional_2: Option<Linear<B>>,
    concat: Linear<B>,
}

though in your case I think the submodules are not necessarily optional, just the computation based on the Option<Tensor<B, D>> tensor for a specific modality.

impl<B: Backend> MultiModalModel<B> {
    pub fn forward(
        &self,
        image: Option<Tensor<B, 4>>,
        spectral: Option<Tensor<B, 3>>,
        // etc.
    ) -> Tensor<B, 2> {
        if let Some(image) = image {
            // Compute for image
        }

        if let Some(spectral) = spectral {
            // Compute for spectral
        }
    }
}

For the data, you can represent each item type with multiple Option<..> fields as well, one per modality. But the batching is where you will have to implement a bit more logic, especially if you have truly heterogeneous batches. To fully leverage GPU utilization, you probably want modality-grouped batches.

Lmk if I missed anything from the original question.

3 replies

LucaCappelletti94 Jan 9, 2026
Author

Ok, I will try to get started and familiarize a bit more with the APIs.

laggui Jan 9, 2026
Maintainer

Sounds good! If you have any questions ask away, either on github or discord (response times are usually shorter). The docs might not be entirely sufficient yet 😅 we're aiming to improve that.

LucaCappelletti94 Jan 9, 2026
Author

Will do! Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-modal models and optional inputs #4293

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Multi-modal models and optional inputs #4293

Uh oh!

LucaCappelletti94 Jan 8, 2026

Replies: 1 comment · 3 replies

Uh oh!

Uh oh!

laggui Jan 9, 2026 Maintainer

Uh oh!

LucaCappelletti94 Jan 9, 2026 Author

Uh oh!

laggui Jan 9, 2026 Maintainer

Uh oh!

LucaCappelletti94 Jan 9, 2026 Author

LucaCappelletti94
Jan 8, 2026

Replies: 1 comment 3 replies

laggui
Jan 9, 2026
Maintainer

LucaCappelletti94 Jan 9, 2026
Author

laggui Jan 9, 2026
Maintainer

LucaCappelletti94 Jan 9, 2026
Author