How to load PyTorch checkpoints into JAX/Flax? #927

marcvanzee · 2021-01-18T11:17:50Z

marcvanzee
Jan 18, 2021
Maintainer

Jan 22, 2021

Pytorch checkpoints contain a state_dict with all the weights/parameters for the models, and converting it to Flax involves:

Defining the model using Flax modules
Renaming the dictionary items to line up, and use the NCHW dimensions for conv weights.

Often flax.traverse_util.flatten_dict is useful, because you only need to operate on a flat dict instead of a nested dict. Once they align you use unflatten_dict to get the normal form back.

@nikitakit wrote the following code for importing PyTorch BERT checkpoints into a Flax model: https://github.com/nikitakit/flax_bert/blob/master/import_weights.py

View full answer

marcvanzee · 2021-01-22T12:46:34Z

marcvanzee
Jan 22, 2021
Maintainer Author

Pytorch checkpoints contain a state_dict with all the weights/parameters for the models, and converting it to Flax involves:

Defining the model using Flax modules
Renaming the dictionary items to line up, and use the NCHW dimensions for conv weights.

Often flax.traverse_util.flatten_dict is useful, because you only need to operate on a flat dict instead of a nested dict. Once they align you use unflatten_dict to get the normal form back.

@nikitakit wrote the following code for importing PyTorch BERT checkpoints into a Flax model: https://github.com/nikitakit/flax_bert/blob/master/import_weights.py

6 replies

avital Jan 26, 2021

Here's another examples from Hugging Face BERT: https://github.com/huggingface/transformers/blob/a880f2549fd5652030afc244f3bb27ec764c5e43/src/transformers/models/bert/modeling_flax_bert.py#L452

GCP20 Mar 6, 2024

Hi, avital, I think this feature is not available as an API within the currently available transformers library? Is there any "official" way of doing this @avital ?

GCP20 Mar 7, 2024

This can be done with the "from_pt" argument (if we are loading a pre-trained model). I'm attaching screenshots for the same.

davisyoshida Mar 11, 2024

@GCP20 There's no generic solution because people can choose to name their parameters different things between the two implementations. I wrote a helper script that does 90% of the work using a string similarity bipartite match, then I clean up the remainder manually.

GinRawin Apr 13, 2024

Thank you for your guys' help. This method also works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to load PyTorch checkpoints into JAX/Flax? #927

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to load PyTorch checkpoints into JAX/Flax? #927

marcvanzee Jan 18, 2021 Maintainer

Replies: 1 comment · 6 replies

marcvanzee Jan 22, 2021 Maintainer Author

avital Jan 26, 2021

GCP20 Mar 6, 2024

GCP20 Mar 7, 2024

davisyoshida Mar 11, 2024

GinRawin Apr 13, 2024

marcvanzee
Jan 18, 2021
Maintainer

Replies: 1 comment 6 replies

marcvanzee
Jan 22, 2021
Maintainer Author