Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Latest commit

 

History

History

Horovod-Tensorflow

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Horovod

This recipe shows how to run Horovod distributed training framework for Tensorflow using Batch AI.

Details

  • Standard Horovod tensorflow_mnist.py example will be used;
  • tensorflow_mnist.py downloads training data on its own during execution;
  • The job will be run on standard tensorflow container tensorflow/tensorflow:1.8.0-gpu. You can run the same job directly on GPU nodes by choosing Ubuntu DSVM as an image and removing container settings from the job definition.;
  • Horovod framework will be installed in the container using job preparation command line. Note, you can build your own docker image containing tensorflow and horovod instead.
  • Standard output of the job will be stored on Azure File Share.

Instructions to Run Recipe

Python Jupyter Notebook

You can find Jupyter Notebook for this recipe in Horovod-Tensorflow.ipynb.

Azure CLI 2.0

You can find Azure CLI 2.0 instructions for this recipe in cli-instructions.md.

License Notice

Under construction...

Help or Feedback


If you have any problems or questions, you can reach the Batch AI team at [email protected] or you can create an issue on GitHub.

We also welcome your contributions of additional sample notebooks, scripts, or other examples of working with Batch AI.