MyrtleSoftware
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 6 additions & 1 deletion b/‎.pre-commit-config.yaml‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎LICENSE‎
Lines changed: 1 addition & 1 deletion b/‎LICENSE‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 24 additions & 20 deletions b/‎README.md‎
Lines changed: 24 additions & 20 deletions
diff --git a/‎scripts/get-version.sh‎
Lines changed: 27 additions & 0 deletions b/‎scripts/get-version.sh‎
Lines changed: 27 additions & 0 deletions
diff --git a/‎scripts/next-version.sh‎
Lines changed: 19 additions & 0 deletions b/‎scripts/next-version.sh‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎training/.coveragerc‎
Lines changed: 9 additions & 0 deletions b/‎training/.coveragerc‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎training/Dockerfile‎
Lines changed: 6 additions & 6 deletions b/‎training/Dockerfile‎
Lines changed: 6 additions & 6 deletions
@@ -20,6 +20,7 @@ compile_commands.json
 # testing
 .hypothesis
 .pytest_cache
+*.coverage
 
 # logs
 *.log
@@ -13,7 +13,7 @@ repos:
     rev: 5.12.0
     hooks:
     - id: isort
-      args: [--profile=black, --project=rnnt_train]
+      args: [--profile=black, --project=rnnt_train, --project=internal_rnnt_train]
 -   repo: https://github.com/psf/black
     rev: 23.3.0
     hooks:
@@ -39,3 +39,8 @@ repos:
       ]
       # Later versions of node are incompatible with Ubuntu 18
       language_version: "17.9.1"
+- repo: https://github.com/PyCQA/flake8
+  rev: 7.0.0
+  hooks:
+  - id: flake8
+    args: [--max-line-length=92, "--extend-ignore=E203,F401,F722"]
@@ -3,7 +3,7 @@ Benchmark, previously released under the Apache license.  This Derivative Work,
 derived by, and including new code written by, Myrtle.ai, is released under
 the MIT license:
 
-Copyright (c) 2022 Myrtle.ai
+Copyright (c) 2023 Myrtle.ai
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 
@@ -20,22 +20,22 @@ The solution supports two model configurations:
 where:
 
 * **Realtime streams (RTS)** is the number of concurrent streams that can be serviced by a single accelerator
-* **p99 latency** is the 99th-percentile latency to process a single 60 ms audio frame and return any predictions. Note that latency increases with more concurrent streams.
+* **p99 latency** is the 99th-percentile latency to process a single 60 ms audio frame and return any predictions. Note that latency increases with the number of concurrent streams.
 
-<sup>§</sup>The `large` model inference performance figures are provisional.
-
-The **solution scales linearly with number of accelerators in the server** (tested up to 8000 RTS per server).
+The **solution scales linearly up to 8 accelerators and we have measured a single server supporting 16000 RTS** with the `base` model.
 
 The `base` and `large` configurations are optimised for inference on FPGA with Myrtle's IP to achieve high-utilisation of the available resources. They were chosen after hyperparameter searches on 10k-50k hrs of training data.
 
+<sup>§</sup>The `large` model inference performance figures are provisional.
+
 ### Word Error Rates (WERs)
 
 When training on the 50k hrs of open-source data described below, the solution has the following WERs:
 
-| Model   | MLS   | LibriSpeech-dev-clean | LibriSpeech-dev-other | Earnings21<sup>*</sup> |
-|---------|-------|-----------------------|-----------------------|------------------------|
-| `base`  | 9.37% |                 3.01% |                 8.14% |                 26.98% |
-| `large` | 7.93% |                 2.69% |                 7.14% |                 23.33% |
+| Model             | MLS   | LibriSpeech-dev-clean | LibriSpeech-dev-other | Earnings21<sup>*</sup> |
+|-------------------|-------|-----------------------|-----------------------|------------------------|
+| `base`<sup>†</sup> | 9.37% |                 3.01% |                 8.14% |                 26.98% |
+| `large`           | 7.70% |                 2.53% |                 6.90% |                 21.85% |
 
 These WERs are for streaming scenarios without additional forward context. Both configurations have a frame size of 60ms, so, for a given segment of audio, the model sees between 0 and 60ms of future context before making predictions.
 
@@ -48,28 +48,32 @@ The 50k hrs of training data is a mixture of the following open-source datasets:
 
 This data has a `maximum_duration` of 20s and a mean length of 12.75s.
 
-**<sup>*</sup>** None of these training data subsets include near-field unscripted utterances nor financial terminology. As such the Earnings21 benchmark is out-of-domain for these systems.
+<sup>*</sup>None of these training data subsets include near-field unscripted utterances nor financial terminology. As such the Earnings21 benchmark is out-of-domain for these systems.
+<sup>†</sup>`base` model WERs were not updated for the latest release. The provided values are from version [v1.6.1](https://github.com/MyrtleSoftware/myrtle-rnnt/releases/tag/v1.6.0).
 
 ### Training times <a name="train-timings"></a>
 
 Training throughputs on an `8 x A100 (80GB)` system are as follows:
 
-| Model   | Training time | Throughput  | No. of updates | per-gpu `batch_size` | `GRAD_ACCUMULATION_BATCHES` |
-|---------|---------------|-------------|----------------|----------------------|-----------------------------|
-| `base`  | 1.8 days      | 671 utt/sec | 100k           |                   32 |                           4 |
-| `large` | 3.1 days      | 380 utt/sec | 100k           |                   16 |                           8 |
+| Model   | Training time | Throughput  | No. of updates | `grad_accumulation_batches` | `batch_split_factor` |
+|---------|---------------|-------------|----------------|-----------------------------|----------------------|
+| `base`  | 1.6 days      | 729 utt/sec | 100k           |                           1 |                    8 |
+| `large` | 2.2 days      | 550 utt/sec | 100k           |                           1 |                   16 |
 
 Training times on an `8 x A5000 (24GB)` system are as follows:
 
-| Model   | Training time | Throughput  | No. of updates | per-gpu `batch_size` | `GRAD_ACCUMULATION_BATCHES` |
-|---------|---------------|-------------|----------------|----------------------|-----------------------------|
-| `base`  | 4.4 days      | 268 utt/sec | 100k           |                    8 |                          16 |
-| `large` | 12.9 days     | 92 utt/sec  | 100k           |                    4 |                          32 |
+| Model   | Training time | Throughput  | No. of updates | `grad_accumulation_batches` | `batch_split_factor` |
+|---------|---------------|-------------|----------------|-----------------------------|----------------------|
+| `base`  | 3.1 days      | 379 utt/sec | 100k           |                           1 |                   16 |
+| `large` | 8.5 days      | 140 utt/sec | 100k           |                           8 |                    4 |
 
 where:
 
 * **Throughput** is the number of utterances seen per second during training (higher is better)
-* **No. of updates** is the number of optimiser steps at `GLOBAL_BATCH_SIZE=1024` that are required to train the models on the 50k hrs training dataset. You may need fewer steps when training with less data
-* **`GRAD_ACCUMULATION_BATCHES`** is the number of gradient accumulation steps per gpu required to achieve the `GLOBAL_BATCH_SIZE` of 1024. For all configurations the **per-gpu `batch_size`** is as large as possible meaning that `GRAD_ACCUMULATION_BATCHES` is set as small as possible.
+* **No. of updates** is the number of optimiser steps at `--global_batch_size=1024` that are required to train the models on the 50k hrs training dataset. You may need fewer steps when training with less data
+* **`grad_accumulation_batches`** is the number of gradient accumulation steps performed on each GPU before taking an optimizer step
+* **`batch_split_factor`** is the number of sub-batches that the `PER_GPU_BATCH_SIZE` is split into before these sub-batches are passed through the joint network and loss.
+
+For more details on these hyper-parameters, including how to set them, please refer to the [batch size arguments](training/docs/batch_size_hyperparameters.md) documentation.
 
-For more details on the batch size hyperparameters refer to the [Training Commands subsection of training/README.md](training/README.md#training). To get started with training see the [training/README.md](training/README.md).
+To get started with training see the [training/README.md](training/README.md).
@@ -0,0 +1,27 @@
+#!/usr/bin/env bash
+BRANCH_NAME=$(git rev-parse --abbrev-ref HEAD)
+
+LATEST_VERSION_TAG=$(git describe --tags --match="v[0-9]*" external --abbrev=0)
+
+LAST_MAJOR_MINOR=$(echo $LATEST_VERSION_TAG | sed -r 's/v([0-9]+)\.([0-9]+)\.[0-9]+/\1.\2/g')
+NEXT_MAJOR_MINOR=$(scripts/next-version.sh $LAST_MAJOR_MINOR)
+
+if [[ $BRANCH_NAME == "main" ]]; then
+    VERSION=main-$NEXT_MAJOR_MINOR-$(git rev-parse --short HEAD)
+    elif [[ $BRANCH_NAME == "external" ]]; then
+    # valid tag in this case is the version tag
+    # ... but this should be a manual tagging step (for now at least)
+    echo "Versioning must be manual on external branch"
+    exit 1
+    elif [[ $BRANCH_NAME =~ release/v[0-9]* ]]; then
+    # ignore NEXT_MAJOR_MINOR and use the branch name version
+    RELEASE_BRANCH_VERSION=$(echo $BRANCH_NAME | sed -r 's/release\///g')
+    VERSION=rc-$RELEASE_BRANCH_VERSION-$(git rev-parse --short HEAD)
+else
+    # replace "_" with "-"
+    BRANCH_NAME=$(echo "$BRANCH_NAME" | sed -r 's/[_:]/-/g')
+    # only allow a single tag for each feature branch to save docker image space
+    VERSION=f-$BRANCH_NAME-$NEXT_MAJOR_MINOR
+fi
+
+echo $VERSION
@@ -0,0 +1,19 @@
+#!/usr/bin/env bash
+
+# script returns next semantic version.
+version=$1
+# Input version must be either major.minor.patch or major.minor.
+# Script will return the next patch or minor version respectively
+# so:
+# next-version.sh 1.2.3 will return 1.2.4
+# next-version.sh 1.2 will return 1.3
+
+if [[ $version =~ ^v?[0-9]+\.[0-9]+ ]]; then
+    # edited from https://unix.stackexchange.com/questions/23174/increment-number-in-bash-variable-string:
+    [[ "$version" =~ (.*[^0-9])([0-9]+)$ ]] && version="${BASH_REMATCH[1]}$((${BASH_REMATCH[2]} + 1))";
+    echo "v$version";
+else
+    echo "Invalid version tag: '$version'"
+    exit 1
+
+fi
@@ -0,0 +1,9 @@
+[run]
+omit =
+    */tests/*
+
+[report]
+exclude_lines =
+    raise AssertionError
+    raise NotImplementedError
+    if __name__ == .__main__.:
@@ -12,16 +12,16 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:23.03-py3
+ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:23.10-py3
 FROM ${FROM_IMAGE_NAME}
 
 # pytorch version taken from here:
-# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-23-03.html#rel-23-03
-ENV PYTORCH_VERSION=2.0.0a0+1767026
+# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-23-10.html#rel-23-10
+ENV PYTORCH_VERSION=2.1.0a0+32f93b1
 
-# Added by rob@myrtle May 2022 to fix NVIDIA key rotation problem.
+# fix NVIDIA key rotation problem.
 # See https://forums.developer.nvidia.com/t/notice-cuda-linux-repository-key-rotation/212771 for details.
-RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
+RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub
 
 # need to set the tzdata time noninteractively
 RUN apt-get update && \
@@ -49,7 +49,7 @@ WORKDIR /workspace/training/lib
 # # Separating the build/install steps gives better stdout/stderr diagnostics
 RUN python setup.py build
 # This is a non-editable install (we need it to put the cuda extensions in the module path which -e does not do)
-RUN python -m pip install --use-feature=in-tree-build .
+RUN python -m pip install .
 # Reset the workspace, needed by following scripts
 WORKDIR /workspace/training