New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to convert a model to parameter only checkpoints (unscanned) on a CPU VM #634
Comments
Some more context, in this line the |
Here is a quick update - the issue was the base image of my CPU VM - I changed it from the default image to the DLVM image ( |
When converting a checkpoint to param only checkpoint using the
maxtext/MaxText/generate_param_only_checkpoint.py
script on a CPU device I get this error:Looking closely at the
max_utils.maybe_initialize_jax_distributed_system()
I noticed that, the GPU condition is met and it callsjax.distributed.initialize()
.Changing the conditions in
max_utils.maybe_initialize_jax_distributed_system()
to force callinginitialize_jax_for_cpu
results in the following error:It seems that
max_utils .get_coordinator_ip_address()
doesn't handle a situation in whichJAX_COORDINATOR_ADDRESS
isNone
and/or it's unclear how it works on a CPU device.The text was updated successfully, but these errors were encountered: