Skip to content

An efficient OpenFST-based tool for calculating WER and aligning two transcript sequences.

License

Notifications You must be signed in to change notification settings

revdotcom/fstalign

Folders and files

NameName
Last commit message
Last commit date

Latest commit

b10f0ea · Jan 27, 2025

History

59 Commits
Apr 6, 2022
Dec 2, 2024
Aug 22, 2022
Oct 31, 2024
Apr 18, 2024
Jul 13, 2023
Dec 10, 2024
Mar 24, 2021
Mar 24, 2021
Jul 13, 2023
Jul 13, 2023
Mar 24, 2021
Oct 31, 2024
Jan 27, 2025
Dec 2, 2024

Repository files navigation

CI License

fstalign

Overview

fstalign is a tool for creating alignment between two sequences of tokens (here out referred to as “reference” and “hypothesis”). It has two key functions: computing word error rate (WER) and aligning NLP-formatted references with CTM hypotheses.

Due to its use of OpenFST and lazy algorithms for text-based alignment, fstalign is efficient for calculating WER while also providing significant flexibility for different measurement features and error analysis.

Installation

Dependencies

We use git submodules to manage third-party dependencies. Initialize and update submodules before proceeding to the main build steps.

git submodule update --init --recursive

This will pull the current dependencies:

  • catch2 - for unit testing
  • spdlog - for logging
  • CLI11 - for CLI construction
  • csv - for CTM and NLP input parsing
  • jsoncpp - for JSON output construction
  • strtk - for various string utilities

Additionally, we have dependencies outside of the third-party submodules:

  • OpenFST - currently provided to the build system by settings the $OPENFST_ROOT environment variable or during the CMake command via -DOPENFST_ROOT.

Build

The current build framework is CMake. Install CMake following the instructions here (https://cmake.org/install/).

To build fstalign, run:

    mkdir build && cd build
    cmake .. -DOPENFST_ROOT="<path to OpenFST>" -DDYNAMIC_OPENFST=ON
    make

Note: -DDYNAMIC_OPENFST=ON is needed if OpenFST at OPENFST_ROOT is compiled as shared libraries. Otherwise static libraries are assumed.

Finally, tests can be run using:

make test

Docker

The fstalign docker image is hosted on Docker Hub and can be easily pulled and run:

docker pull revdotcom/fstalign
docker run --rm -it revdotcom/fstalign

See https://hub.docker.com/r/revdotcom/fstalign/tags for the available versions/tags to pull. If you desire to run the tool on local files you can mount local directories with the -v flag of the docker run command.

From inside the container:

/fstalign/build/fstalign --help

For development you can also build the docker image locally using:

docker build . -t fstalign-dev

Documentation

For more information on how to use fstalign see our documentation for more details.