Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Latest commit

 

History

History
32 lines (20 loc) · 1.56 KB

File metadata and controls

32 lines (20 loc) · 1.56 KB

Horovod

This recipe shows how to run Horovod distributed training framework for Tensorflow using Batch AI.

Details

  • Standard Horovod tensorflow_mnist.py example will be used;
  • tensorflow_mnist.py downloads training data on its own during execution;
  • The job will be run on standard tensorflow container tensorflow/tensorflow:1.8.0-gpu. You can run the same job directly on GPU nodes by choosing Ubuntu DSVM as an image and removing container settings from the job definition.;
  • Horovod framework will be installed in the container using job preparation command line. Note, you can build your own docker image containing tensorflow and horovod instead.
  • Standard output of the job will be stored on Azure File Share.

Instructions to Run Recipe

Python Jupyter Notebook

You can find Jupyter Notebook for this recipe in Horovod-Tensorflow.ipynb.

Azure CLI 2.0

You can find Azure CLI 2.0 instructions for this recipe in cli-instructions.md.

License Notice

Under construction...

Help or Feedback


If you have any problems or questions, you can reach the Batch AI team at [email protected] or you can create an issue on GitHub.

We also welcome your contributions of additional sample notebooks, scripts, or other examples of working with Batch AI.