From efded9612bcd1860e8feb0ba6188155cd91ce7e4 Mon Sep 17 00:00:00 2001 From: Christoph Schranz Date: Mon, 6 Mar 2023 13:22:07 +0100 Subject: [PATCH] adding tensorboard examples --- .../tensorboard_with_pytorch.ipynb | 289 +++++++ .../tensorboard_with_tensorflow.ipynb | 715 ++++++++++++++++++ 2 files changed, 1004 insertions(+) create mode 100644 extra/Getting_Started/tensorboard/tensorboard_with_pytorch.ipynb create mode 100644 extra/Getting_Started/tensorboard/tensorboard_with_tensorflow.ipynb diff --git a/extra/Getting_Started/tensorboard/tensorboard_with_pytorch.ipynb b/extra/Getting_Started/tensorboard/tensorboard_with_pytorch.ipynb new file mode 100644 index 0000000..1de5193 --- /dev/null +++ b/extra/Getting_Started/tensorboard/tensorboard_with_pytorch.ipynb @@ -0,0 +1,289 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "# How to use TensorBoard with PyTorch\n", + "\n", + "Copyrights see https://pytorch.org/tutorials/recipes/recipes/tensorboard_with_pytorch.html\n", + "\n", + "TensorBoard is a visualization toolkit for machine learning experimentation. \n", + "TensorBoard allows tracking and visualizing metrics such as loss and accuracy, \n", + "visualizing the model graph, viewing histograms, displaying images and much more. \n", + "In this tutorial we are going to cover TensorBoard installation, \n", + "basic usage with PyTorch, and how to visualize data you logged in TensorBoard UI.\n", + "\n", + "## Installation\n", + "PyTorch should be installed to log models and metrics into TensorBoard log \n", + "directory. The following command will install PyTorch 1.4+ via \n", + "Anaconda (recommended):\n", + "\n", + "::\n", + "\n", + " $ conda install pytorch torchvision -c pytorch \n", + " \n", + "\n", + "or pip\n", + "\n", + "::\n", + "\n", + " $ pip install torch torchvision\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Using TensorBoard in PyTorch\n", + "\n", + "Let’s now try using TensorBoard with PyTorch! Before logging anything, \n", + "we need to create a ``SummaryWriter`` instance.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2023-03-06 10:53:12.601699: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA\n", + "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", + "2023-03-06 10:53:12.711871: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", + "2023-03-06 10:53:13.121665: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory\n", + "2023-03-06 10:53:13.121710: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory\n", + "2023-03-06 10:53:13.121717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n" + ] + } + ], + "source": [ + "import torch\n", + "from torch.utils.tensorboard import SummaryWriter\n", + "writer = SummaryWriter()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Writer will output to ``./runs/`` directory by default.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Log scalars\n", + "\n", + "In machine learning, it’s important to understand key metrics such as \n", + "loss and how they change during training. Scalar helps to save \n", + "the loss value of each training step, or the accuracy after each epoch. \n", + "\n", + "To log a scalar value, use \n", + "``add_scalar(tag, scalar_value, global_step=None, walltime=None)``. \n", + "For example, lets create a simple linear regression training, and \n", + "log loss value using ``add_scalar``\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "x = torch.arange(-5, 5, 0.1).view(-1, 1)\n", + "y = -5 * x + 0.1 * torch.randn(x.size())\n", + "\n", + "model = torch.nn.Linear(1, 1)\n", + "criterion = torch.nn.MSELoss()\n", + "optimizer = torch.optim.SGD(model.parameters(), lr = 0.1)\n", + "\n", + "def train_model(iter):\n", + " for epoch in range(iter):\n", + " y1 = model(x)\n", + " loss = criterion(y1, y)\n", + " writer.add_scalar(\"Loss/train\", loss, epoch)\n", + " optimizer.zero_grad()\n", + " loss.backward()\n", + " optimizer.step()\n", + " \n", + "train_model(10)\n", + "writer.flush()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Call ``flush()`` method to make sure that all pending events \n", + "have been written to disk.\n", + "\n", + "See [torch.utils.tensorboard tutorials](https://pytorch.org/docs/stable/tensorboard.html) \n", + "to find more TensorBoard visualization types you can log.\n", + "\n", + "If you do not need the summary writer anymore, call ``close()`` method.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "writer.close()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Run TensorBoard\n", + "\n", + "Install TensorBoard through the command line to visualize data you logged\n", + "\n", + "::\n", + "\n", + " $ pip install tensorboard\n", + "\n", + "\n", + "Now, start TensorBoard, specifying the root log directory you used above. \n", + "Argument ``logdir`` points to directory where TensorBoard will look to find \n", + "event files that it can display. TensorBoard will recursively walk \n", + "the directory structure rooted at logdir, looking for .*tfevents.* files.\n", + "\n", + "::\n", + "\n", + " $ tensorboard --logdir=runs\n", + "\n", + "Go to the URL it provides OR to [http://localhost:6006/](http://localhost:6006/)\n", + "\n", + "\n", + "\n", + "This dashboard shows how the loss and accuracy change with every epoch. \n", + "You can use it to also track training speed, learning rate, and other \n", + "scalar values. It’s helpful to compare these metrics across different \n", + "training runs to improve your model.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Share TensorBoard dashboards\n", + "\n", + "[TensorBoard.dev](https://tensorboard.dev/) lets you upload and share \n", + "your ML experiment results with anyone. Use TensorBoard.dev to host, \n", + "track, and share your TensorBoard dashboards.\n", + "\n", + "Install the latest version of TensorBoard to use the uploader.\n", + "\n", + ":: \n", + "\n", + " $ pip install tensorboard --upgrade\n", + "\n", + "Use a simple command to upload and share your TensorBoard.\n", + "\n", + ":: \n", + "\n", + " $ tensorboard dev upload --logdir runs \\\n", + " --name \"My latest experiment\" \\ # optional\n", + " --description \"Simple comparison of several hyperparameters\" # optional\n", + "\n", + "For help, run ``$ tensorboard dev --help``.\n", + "\n", + "**Note:** Uploaded TensorBoards are public and visible to everyone. \n", + "Do not upload sensitive data.\n", + "\n", + "View your TensorBoard live at URL provided in your terminal. \n", + "E.g. [https://tensorboard.dev/experiment/AdYd1TgeTlaLWXx6I8JUbA](https://tensorboard.dev/experiment/AdYd1TgeTlaLWXx6I8JUbA)\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "

Note

TensorBoard.dev currently supports scalars, graphs, histograms, distributions, hparams, and text dashboards.

\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Learn More\n", + "\n", + "- [torch.utils.tensorboard](https://pytorch.org/docs/stable/tensorboard.html) docs\n", + "- [Visualizing models, data, and training with TensorBoard](https://pytorch.org/tutorials/intermediate/tensorboard_tutorial.html) tutorial\n", + "\n", + "\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/extra/Getting_Started/tensorboard/tensorboard_with_tensorflow.ipynb b/extra/Getting_Started/tensorboard/tensorboard_with_tensorflow.ipynb new file mode 100644 index 0000000..c33509d --- /dev/null +++ b/extra/Getting_Started/tensorboard/tensorboard_with_tensorflow.ipynb @@ -0,0 +1,715 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "SB93Ge748VQs" + }, + "source": [ + "##### Copyright 2019 The TensorFlow Authors." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "cellView": "form", + "id": "0sK8X2O9bTlz", + "tags": [] + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HEYuO5NFwDK9" + }, + "source": [ + "# Get started with TensorBoard\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " View on TensorFlow.org\n", + " \n", + " Run in Google Colab\n", + " \n", + " View source on GitHub\n", + " \n", + " Download notebook\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "56V5oun18ZdZ" + }, + "source": [ + "In machine learning, to improve something you often need to be able to measure it. TensorBoard is a tool for providing the measurements and visualizations needed during the machine learning workflow. It enables tracking experiment metrics like loss and accuracy, visualizing the model graph, projecting embeddings to a lower dimensional space, and much more.\n", + "\n", + "This quickstart will show how to quickly get started with TensorBoard. The remaining guides in this website provide more details on specific capabilities, many of which are not included here. " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "6B95Hb6YVgPZ", + "tags": [] + }, + "outputs": [], + "source": [ + "# Load the TensorBoard notebook extension\n", + "%load_ext tensorboard" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "_wqSAZExy6xV", + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2023-03-06 10:51:15.118111: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA\n", + "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", + "2023-03-06 10:51:15.217654: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", + "2023-03-06 10:51:15.671297: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory\n", + "2023-03-06 10:51:15.671357: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory\n", + "2023-03-06 10:51:15.671377: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n" + ] + } + ], + "source": [ + "import tensorflow as tf\n", + "import datetime" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "Ao7fJW1Pyiza", + "tags": [] + }, + "outputs": [], + "source": [ + "# Clear any logs from previous runs\n", + "!rm -rf ./logs/ " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "z5pr9vuHVgXY" + }, + "source": [ + "Using the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) dataset as the example, normalize the data and write a function that creates a simple Keras model for classifying the images into 10 classes." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "j-DHsby18cot", + "tags": [] + }, + "outputs": [], + "source": [ + "mnist = tf.keras.datasets.mnist\n", + "\n", + "(x_train, y_train),(x_test, y_test) = mnist.load_data()\n", + "x_train, x_test = x_train / 255.0, x_test / 255.0\n", + "\n", + "def create_model():\n", + " return tf.keras.models.Sequential([\n", + " tf.keras.layers.Flatten(input_shape=(28, 28), name='layers_flatten'),\n", + " tf.keras.layers.Dense(512, activation='relu', name='layers_dense'),\n", + " tf.keras.layers.Dropout(0.2, name='layers_dropout'),\n", + " tf.keras.layers.Dense(10, activation='softmax', name='layers_dense_2')\n", + " ])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XKUjdIoV87um" + }, + "source": [ + "## Using TensorBoard with Keras Model.fit()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8CL_lxdn8-Sv" + }, + "source": [ + "When training with Keras's [Model.fit()](https://www.tensorflow.org/api_docs/python/tf/keras/models/Model#fit), adding the `tf.keras.callbacks.TensorBoard` callback ensures that logs are created and stored. Additionally, enable histogram computation every epoch with `histogram_freq=1` (this is off by default)\n", + "\n", + "Place the logs in a timestamped subdirectory to allow easy selection of different training runs." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "WAQThq539CEJ", + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2023-03-06 10:51:16.792703: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2023-03-06 10:51:16.816039: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2023-03-06 10:51:16.816173: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2023-03-06 10:51:16.816790: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA\n", + "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", + "2023-03-06 10:51:16.817127: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2023-03-06 10:51:16.817237: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2023-03-06 10:51:16.817332: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2023-03-06 10:51:17.308525: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2023-03-06 10:51:17.308658: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2023-03-06 10:51:17.308758: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2023-03-06 10:51:17.308838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6574 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1/5\n", + "1875/1875 [==============================] - 11s 6ms/step - loss: 0.2203 - accuracy: 0.9343 - val_loss: 0.1090 - val_accuracy: 0.9677\n", + "Epoch 2/5\n", + "1875/1875 [==============================] - 10s 6ms/step - loss: 0.0959 - accuracy: 0.9710 - val_loss: 0.0805 - val_accuracy: 0.9748\n", + "Epoch 3/5\n", + "1875/1875 [==============================] - 10s 6ms/step - loss: 0.0688 - accuracy: 0.9782 - val_loss: 0.0641 - val_accuracy: 0.9805\n", + "Epoch 4/5\n", + "1875/1875 [==============================] - 10s 5ms/step - loss: 0.0534 - accuracy: 0.9834 - val_loss: 0.0631 - val_accuracy: 0.9814\n", + "Epoch 5/5\n", + "1875/1875 [==============================] - 10s 5ms/step - loss: 0.0426 - accuracy: 0.9862 - val_loss: 0.0681 - val_accuracy: 0.9779\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model = create_model()\n", + "model.compile(optimizer='adam',\n", + " loss='sparse_categorical_crossentropy',\n", + " metrics=['accuracy'])\n", + "\n", + "log_dir = \"logs/fit/\" + datetime.datetime.now().strftime(\"%Y%m%d-%H%M%S\")\n", + "tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)\n", + "\n", + "model.fit(x=x_train, \n", + " y=y_train, \n", + " epochs=5, \n", + " validation_data=(x_test, y_test), \n", + " callbacks=[tensorboard_callback])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "asjGpmD09dRl" + }, + "source": [ + "Start TensorBoard through the command line or within a notebook experience. The two interfaces are generally the same. In notebooks, use the `%tensorboard` line magic. On the command line, run the same command without \"%\"." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "A4UKgTLb9fKI", + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%tensorboard --logdir logs/fit --bind_all" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MCsoUNb6YhGc" + }, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Gi4PaRm39of2" + }, + "source": [ + "A brief overview of the visualizations created in this example and the dashboards (tabs in top navigation bar) where they can be found:\n", + "\n", + "* **Scalars** show how the loss and metrics change with every epoch. You can use them to also track training speed, learning rate, and other scalar values. Scalars can be found in the **Time Series** or **Scalars** dashboards.\n", + "* **Graphs** help you visualize your model. In this case, the Keras graph of layers is shown which can help you ensure it is built correctly. Graphs can be found in the **Graphs** dashboard.\n", + "* **Histograms** and **Distributions** show the distribution of a Tensor over time. This can be useful to visualize weights and biases and verify that they are changing in an expected way. Histograms can be found in the **Time Series** or **Histograms** dashboards. Distributions can be found in the **Distributions** dashboard.\n", + "\n", + "Additional TensorBoard dashboards are automatically enabled when you log other types of data. For example, the Keras TensorBoard callback lets you log images and embeddings as well. You can see what other dashboards are available in TensorBoard by clicking on the \"inactive\" dropdown towards the top right." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nB718NOH95yG" + }, + "source": [ + "## Using TensorBoard with other methods\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IKNt0nWs-Ekt" + }, + "source": [ + "When training with methods such as [`tf.GradientTape()`](https://www.tensorflow.org/api_docs/python/tf/GradientTape), use `tf.summary` to log the required information.\n", + "\n", + "Use the same dataset as above, but convert it to `tf.data.Dataset` to take advantage of batching capabilities:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "nnHx4DsMezy1", + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2023-03-06 10:52:11.798727: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 376320000 exceeds 10% of free system memory.\n", + "2023-03-06 10:52:11.954962: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 376320000 exceeds 10% of free system memory.\n" + ] + } + ], + "source": [ + "train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))\n", + "test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))\n", + "\n", + "train_dataset = train_dataset.shuffle(60000).batch(64)\n", + "test_dataset = test_dataset.batch(64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SzpmTmJafJ10" + }, + "source": [ + "The training code follows the [advanced quickstart](https://www.tensorflow.org/tutorials/quickstart/advanced) tutorial, but shows how to log metrics to TensorBoard. Choose loss and optimizer:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "H2Y5-aPbAANs", + "tags": [] + }, + "outputs": [], + "source": [ + "loss_object = tf.keras.losses.SparseCategoricalCrossentropy()\n", + "optimizer = tf.keras.optimizers.Adam()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cKhIIDj9Hbfy" + }, + "source": [ + "Create stateful metrics that can be used to accumulate values during training and logged at any point:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "jD0tEWrgH0TL", + "tags": [] + }, + "outputs": [], + "source": [ + "# Define our metrics\n", + "train_loss = tf.keras.metrics.Mean('train_loss', dtype=tf.float32)\n", + "train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy('train_accuracy')\n", + "test_loss = tf.keras.metrics.Mean('test_loss', dtype=tf.float32)\n", + "test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy('test_accuracy')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "szw_KrgOg-OT" + }, + "source": [ + "Define the training and test functions:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "id": "TTWcJO35IJgK", + "tags": [] + }, + "outputs": [], + "source": [ + "def train_step(model, optimizer, x_train, y_train):\n", + " with tf.GradientTape() as tape:\n", + " predictions = model(x_train, training=True)\n", + " loss = loss_object(y_train, predictions)\n", + " grads = tape.gradient(loss, model.trainable_variables)\n", + " optimizer.apply_gradients(zip(grads, model.trainable_variables))\n", + "\n", + " train_loss(loss)\n", + " train_accuracy(y_train, predictions)\n", + "\n", + "def test_step(model, x_test, y_test):\n", + " predictions = model(x_test)\n", + " loss = loss_object(y_test, predictions)\n", + "\n", + " test_loss(loss)\n", + " test_accuracy(y_test, predictions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nucPZBKPJR3A" + }, + "source": [ + "Set up summary writers to write the summaries to disk in a different logs directory:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "id": "3Qp-exmbWf4w", + "tags": [] + }, + "outputs": [], + "source": [ + "current_time = datetime.datetime.now().strftime(\"%Y%m%d-%H%M%S\")\n", + "train_log_dir = 'logs/gradient_tape/' + current_time + '/train'\n", + "test_log_dir = 'logs/gradient_tape/' + current_time + '/test'\n", + "train_summary_writer = tf.summary.create_file_writer(train_log_dir)\n", + "test_summary_writer = tf.summary.create_file_writer(test_log_dir)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qgUJgDdKWUKF" + }, + "source": [ + "Start training. Use `tf.summary.scalar()` to log metrics (loss and accuracy) during training/testing within the scope of the summary writers to write the summaries to disk. You have control over which metrics to log and how often to do it. Other `tf.summary` functions enable logging other types of data." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "odWvHPpKJvb_", + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2023-03-06 10:52:12.230397: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 376320000 exceeds 10% of free system memory.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1, Loss: 0.24209801852703094, Accuracy: 92.87166595458984, Test Loss: 0.1255762279033661, Test Accuracy: 96.43000030517578\n", + "Epoch 2, Loss: 0.10410388559103012, Accuracy: 96.8550033569336, Test Loss: 0.08265193551778793, Test Accuracy: 97.25999450683594\n", + "Epoch 3, Loss: 0.07300176471471786, Accuracy: 97.75166320800781, Test Loss: 0.06737181544303894, Test Accuracy: 97.89999389648438\n", + "Epoch 4, Loss: 0.05432678759098053, Accuracy: 98.3133316040039, Test Loss: 0.0691867545247078, Test Accuracy: 97.73999786376953\n", + "Epoch 5, Loss: 0.04201069846749306, Accuracy: 98.6883316040039, Test Loss: 0.06955532729625702, Test Accuracy: 97.80999755859375\n" + ] + } + ], + "source": [ + "model = create_model() # reset our model\n", + "\n", + "EPOCHS = 5\n", + "\n", + "for epoch in range(EPOCHS):\n", + " for (x_train, y_train) in train_dataset:\n", + " train_step(model, optimizer, x_train, y_train)\n", + " with train_summary_writer.as_default():\n", + " tf.summary.scalar('loss', train_loss.result(), step=epoch)\n", + " tf.summary.scalar('accuracy', train_accuracy.result(), step=epoch)\n", + "\n", + " for (x_test, y_test) in test_dataset:\n", + " test_step(model, x_test, y_test)\n", + " with test_summary_writer.as_default():\n", + " tf.summary.scalar('loss', test_loss.result(), step=epoch)\n", + " tf.summary.scalar('accuracy', test_accuracy.result(), step=epoch)\n", + " \n", + " template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'\n", + " print (template.format(epoch+1,\n", + " train_loss.result(), \n", + " train_accuracy.result()*100,\n", + " test_loss.result(), \n", + " test_accuracy.result()*100))\n", + "\n", + " # Reset metrics every epoch\n", + " train_loss.reset_states()\n", + " test_loss.reset_states()\n", + " train_accuracy.reset_states()\n", + " test_accuracy.reset_states()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JikosQ84fzcA" + }, + "source": [ + "Open TensorBoard again, this time pointing it at the new log directory. We could have also started TensorBoard to monitor training while it progresses." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "-Iue509kgOyE", + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "ERROR: Failed to launch TensorBoard (exited with 1).\n", + "Contents of stderr:\n", + "2023-03-06 10:52:56.420237: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA\n", + "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", + "2023-03-06 10:52:56.514948: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", + "2023-03-06 10:52:56.918161: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory\n", + "2023-03-06 10:52:56.918207: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory\n", + "2023-03-06 10:52:56.918215: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n", + "2023-03-06 10:52:57.517025: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2023-03-06 10:52:57.542940: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2023-03-06 10:52:57.543080: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "\n", + "NOTE: Using experimental fast data loading logic. To disable, pass\n", + " \"--load_fast=false\" and report issues on GitHub. More details:\n", + " https://github.com/tensorflow/tensorboard/issues/4784\n", + "\n", + "Address already in use\n", + "Port 6006 is in use by another program. Either identify and stop that program, or start the server with a different port." + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%tensorboard --logdir logs/gradient_tape --bind_all" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NVpnilhEgQXk" + }, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ozbwXgPIkCKV" + }, + "source": [ + "That's it! You have now seen how to use TensorBoard both through the Keras callback and through `tf.summary` for more custom scenarios. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vsowjhkBdkbK" + }, + "source": [ + "## TensorBoard.dev: Host and share your ML experiment results\n", + "\n", + "[TensorBoard.dev](https://tensorboard.dev) is a free public service that enables you to upload your TensorBoard logs and get a permalink that can be shared with everyone in academic papers, blog posts, social media, etc. This can enable better reproducibility and collaboration.\n", + "\n", + "To use TensorBoard.dev, run the following command:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Q3nupQL24E5E", + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-03-06 10:52:58.548075: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA\n", + "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", + "2023-03-06 10:52:58.643597: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", + "2023-03-06 10:52:59.047723: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory\n", + "2023-03-06 10:52:59.047780: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory\n", + "2023-03-06 10:52:59.047788: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n", + "2023-03-06 10:52:59.645017: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2023-03-06 10:52:59.667592: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2023-03-06 10:52:59.667740: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "\n", + "***** TensorBoard Uploader *****\n", + "\n", + "This will upload your TensorBoard logs to https://tensorboard.dev/ from\n", + "the following directory:\n", + "\n", + "logs/fit\n", + "\n", + "This TensorBoard will be visible to everyone. Do not upload sensitive\n", + "data.\n", + "\n", + "Your use of this service is subject to Google's Terms of Service\n", + " and Privacy Policy\n", + ", and TensorBoard.dev's Terms of Service\n", + ".\n", + "\n", + "This notice will not be shown again while you are logged into the uploader.\n", + "To log out, run `tensorboard dev auth revoke`.\n", + "\n", + "Continue? (yes/NO) " + ] + } + ], + "source": [ + "!tensorboard dev upload \\\n", + " --logdir logs/fit \\\n", + " --name \"(optional) My latest experiment\" \\\n", + " --description \"(optional) Simple comparison of several hyperparameters\" \\\n", + " --one_shot" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lAgEh_Ow4EX6" + }, + "source": [ + "Note that this invocation uses the exclamation prefix (`!`) to invoke the shell\n", + "rather than the percent prefix (`%`) to invoke the colab magic. When invoking this command from the command line there is no need for either prefix.\n", + "\n", + "View an example [here](https://tensorboard.dev/experiment/EDZb7XgKSBKo6Gznh3i8hg/#scalars).\n", + "\n", + "For more details on how to use TensorBoard.dev, see https://tensorboard.dev/#get-started" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "get_started.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}