BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications

Code for the paper BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications. To appear at IEEE Spoken Language Technology Workshop (SLT 2022)

Automatic speech recognition (ASR) allows transcribing the communications between air traffic controllers (ATCOs) and aircraft pilots...

The transcriptions are used later to extract ATC named entities, e.g.,
aircraft callsigns. One common challenge is speech activity detection (SAD)
and speaker diarization (SD). In the failure condition, two or more segments
remain in the same recording, jeopardizing the overall performance. (SEE
FIGURE BELOW) We propose a system that combines SAD and a BERT model to
perform speaker change detection and speaker role detection (SRD) by chunking
ASR transcripts, i.e., SD with a defined number of speakers together with
SRD. The proposed model is evaluated on real-life public ATC databases. Our
BERT SD model baseline reaches up to 10% and 20% token-based Jaccard error
rate (JER) in public and private ATC databases. We also achieved relative
improvements of 32% and 7.7% in JERs and SD error rate (DER), respectively,
compared to VBx, a well-known SD system.

Pipeline for BERT-based text diarization.

Token classification fine-tuned on UWB-ATCC dataset: 1) Fine-tuned BERT-base-uncased on UWB-ATCC data: https://huggingface.co/Jzuluaga/bert-base-token-classification-for-atc-en-uwb-atcc |

UWB-ATCC corpus prepared in datasets library format, on HuggingFace hub: https://huggingface.co/datasets/Jzuluaga/uwb_atcc |

Repository written by: Juan Pablo Zuluaga.

Preparing Environment

The first step is to create your environment with the required packages for data preparation, formatting, and to carry out the experiments. You can run the following commands to create the conda environment (assuming CUDA - 11.7):

Step 1: Using python 3.10: install python and the requirements

git clone https://github.com/idiap/bert-text-diarization-atc
conda create -n diarization python==3.10
conda activate diarization
python -m pip install -r requirements.txt

Before running any script, make sure you have en_US locale set and PYTHONPATH in repository root folder.

export LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8
export PYTHONPATH=$PYTHONPATH:$(pwd) #assuming you are in root repository folder

Usage

There are several steps to replicate/use our proposed models:

Download the Data

For our experiments, we used 3 public databases and 2 private databases (see Table 1 on paper). We provide scripts to replicate some of the results ONLY for the public databases.

Go to the data folder and follow the step-by-step process (easy) in the README file.

TL;DR for 1 public & free corpus:

Step 1: download (1.2 GB) for free the UWB-ATCC CORPUS from: https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0001-CCA1-0
Step 2: format and prepare the data for experimentation:

conda activate diarization
bash data/databases/uwb_atcc/data_prepare_uwb_atcc_corpus.sh
bash data/databases/uwb_atcc/exp_prepare_uwb_atcc_corpus.sh

-- The output folder should be in experiments/data/uwb_atcc/{train,test} --

Training one model

Here, we describe how to train one model with the UWB-ATCC, which is free!!!

Most of the training and evaluation scripts are in the src/ folder. The training procedure is very simple. You can train a baseline model with UWB-ATCC by calling the high-level script:

bash src/train_one_model.sh \
  --dataset "uwb_atcc" \
  --train-data experiments/data/uwb_atcc/train/diarization/utt2text_tags \
  --test-data experiments/data/uwb_atcc/test/diarization/utt2text_tags \
  --output-dir "experiments/results/baseline"

Additionally, you can modify some training hyperparameters by calling train_diarization.py (which is call internally in src/train_one_model.sh) directly and passing values from the CLI, e.g., --train-batch-size 64 (instead of default=32), or use another encoder, --input-model "bert-large-uncased"...

python3 src/train_diarization.py \
    --report-to none \
    --epochs 4 \
    --seed 1234 \
    --max-train-samples -1 \
    --train-batch-size 32 \
    --eval-batch-size 16 \
    --warmup-steps 500 \
    --logging-steps 1000 \
    --save-steps 10000 \
    --eval-steps 500 \
    --max-steps 3000 \
    --input-model bert-base-uncased \
    --test-data experiments/data/uwb_atcc/test/diarization/utt2text_tags \
    experiments/data/uwb_atcc/train/diarization/utt2text_tags \
    experiments/results/baseline/bert-base-uncased/1234/uwb_atcc

Train baselines

We have prepared some scripts to replicate some baselines from our paper.

Script to run and evaluate the baseline BERT models for UWB-ATCC and LDC-ATCC (see Table 3 on paper):

bash train_baselines.sh

Script to run and evaluate the BERT models with DATA augmentation for UWB-ATCC and LDC-ATCC (see Section 3.4 and Table 4 on paper).

You can either train only one model (example for UWB-ATCC and LDC-ATCC corpus):

bash ablations/train_uwb_atcc_baseline_augmentation.sh
# or, for LDC-ATCC,
bash ablations/train_ldc_atcc_baseline_augmentation.sh

or you can train 5 models (per corpus) with different seeds:

bash ablations/train_uwb_atcc_5seeds_augmentation.sh
# or, for LDC-ATCC,
bash ablations/train_ldc_atcc_5seeds_augmentation.sh

Evaluate models (optional)

We have prepared two scripts to evaluate and perform inference with a defined model, e.g., train and evaluate on UWB-ATCC corpus:

To evaluate the model and print the metrics in the training folder:

bash src/eval_model.sh \
  --DATA "experiments/data" \
  --batch-size 16 \
  --dataset "uwb_atcc" \
  --output-dir "experiments/results/baseline"

To get outputs in the utt2text_tags format:

bash src/run_inference.sh \
  --DATA "experiments/data" \
  --batch-size 16 \
  --dataset "uwb_atcc" \
  --output-dir "experiments/results/baseline"

If you want to do something more specific, like, use UWB-ATCC corpus for training and evaluate on ATCO2-test-set corpus, you can use the python script directly as:

# this is the folder where the model is located
EXP_FOLDER=experiments/results/baseline/bert-base-uncased/1234/uwb_atcc/

python3 src/eval_diarization.py \
    --input-model "$EXP_FOLDER/final_checkpoint" \
    --batch-size 32 \
    --input-files "experiments/data/atco2_corpus/test/diarization/utt2text_tags" \
    --test-names "atco2_corpus" \
    --output-folder "$EXP_FOLDER/evaluations"

That will generate inference outputs on the $EXP_FOLDER/evaluations, OR in $EXP_FOLDER/inference if you use inference_diarization.py instead of eval_diarization.py

Evaluate DER outputs of your model

Here, we describe briefly how to evaluate the outputs of your model with standard acoustic-based metrics, e.g., DER and JER.

This is of special usage when evaluating the model on ASR transcripts. Here, you need to first perform force alignment to align text tokens to acoustic timing.

You need to get the force alignment beween speech/transcription pairs using some force alignment toolkits e.g. Kaldi-aligner to get a CTM file.

Which looks like this:

uwb_atcc_augmented_00000_C 1 0.09 0.05 wizz 1.00 
uwb_atcc_augmented_00000_C 1 0.14 0.04 air 1.00 
uwb_atcc_augmented_00000_C 1 0.19 0.07 four 1.00 
uwb_atcc_augmented_00000_C 1 0.26 0.05 nine 1.00 
uwb_atcc_augmented_00000_C 1 0.31 0.05 one 1.00 
uwb_atcc_augmented_00000_C 1 0.36 0.09 contact 1.00 
uwb_atcc_augmented_00000_C 1 0.45 0.07 praha 1.00 
uwb_atcc_augmented_00000_C 1 0.52 0.12 radar 1.00 
uwb_atcc_augmented_00000_C 1 0.64 0.06 one 1.00 
uwb_atcc_augmented_00000_C 1 0.70 0.05 two 1.00 
uwb_atcc_augmented_00000_C 1 0.75 0.08 zero 1.00 
uwb_atcc_augmented_00000_C 1 0.83 0.12 decimal 1.00 
uwb_atcc_augmented_00000_C 1 0.95 0.04 two 1.00 
uwb_atcc_augmented_00000_C 1 1.00 0.07 seven 1.00 
uwb_atcc_augmented_00000_C 1 1.08 0.09 five 1.00 
uwb_atcc_augmented_00000_C 1 1.17 0.03 good 1.00 
uwb_atcc_augmented_00000_C 1 1.20 0.09 bye 0.95

To evaluate the DER for subset a of uwb_atcc corpus, you can check the required files in experiments/data/uwb_atcc_subset. For computing the DER on this subset, you can run:

bash src/eval_der.sh

We share this folder which contains only some examples for computing the acoustic-based DER.

Get the metrics

We prepared one script (get_metrics.py that list all the performances produced in the $EXP_FOLDER/evaluations for a given model. For instance, if you run:

MODEL with UWB-ATCC corpus is trained in: experiments/results/baseline/bert-base-uncased/1234/uwb_atcc

bash src/get_metrics.sh --evaluation-folder experiments/results/baseline/bert-base-uncased/1234/uwb_atcc/evaluations

Related work

Here is a list of papers that are somehow related to AI/ML targeted to Air traffic control communications:

Fine-tuning a pretrained Wav2vec 2.0 model for automatic speech recognition:
- paper: How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications
- code: https://github.com/idiap/w2v2-air-traffic
How to use contextual data (biasing) in ATC automatic speech recognition:
- Paper: A two-step approach to leverage contextual data: speech recognition in air-traffic communications
ATCO2 corpus derived from the ATCO2 project: this is a extensive work describing how we collected more than 5000 hours of ATC communications. Later, we pre-transcribed it and trained ASR and NLP models for ATC communications:
- paper: ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications
- code: https://github.com/idiap/atco2-corpus
Ethics in collection ATC data: Legal and Ethical Challenges in Recording Air Traffic Control Speech

Some other papers:

How to cite us

If you use this code for your research, please cite our paper with:

Zuluaga-Gomez, J., Sarfjoo, S. S., Prasad, A., Nigmatulina, I., Motlicek, P., Ondrej, K., Ohneiser, O., & Helmke, H. (2021). BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications. 2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar.

or use the bibtex item:

@article{zuluaga2022bertraffic,
  title={BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications},
  author={Zuluaga-Gomez, Juan and Sarfjoo, Seyyed Saeed and Prasad, Amrutha and Nigmatulina, Iuliia and Motlicek, Petr and Ondre, Karel and Ohneiser, Oliver and Helmke, Hartmut},
  journal={IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar},
  year={2022}
  }

and,

@article{zuluaga2022atco2,
  title={ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications},
  author={Zuluaga-Gomez, Juan and Vesel{\`y}, Karel and Sz{\"o}ke, Igor and Motlicek, Petr and others},
  journal={arXiv preprint arXiv:2211.04054},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ablations		ablations
data		data
experiments/data/uwb_atcc_subset		experiments/data/uwb_atcc_subset
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_baselines.sh		train_baselines.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications

Table of Contents

Preparing Environment

Usage

Download the Data

Training one model

Train baselines

Evaluate models (optional)

Evaluate DER outputs of your model

Get the metrics

Related work

How to cite us

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

idiap/bert-text-diarization-atc

Folders and files

Latest commit

History

Repository files navigation

BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications

Table of Contents

Preparing Environment

Usage

Download the Data

Training one model

Train baselines

Evaluate models (optional)

Evaluate DER outputs of your model

Get the metrics

Related work

How to cite us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages