This repository contains an implementation of the STOI metric1, an intrusive objective measure used to predict speech intelligibility in noisy environments. The STOI metric is widely used in evaluating the effectiveness of hearing aid algorithms, speech enhancement systems, and machine learning-based intelligibility predictors.
This implementation was developed as part of a B.Sc. thesis, and the STOI-derived d-matrices were later used as inputs for neural networks.
The STOI metric is a computationally efficient way to predict speech intelligibility based on the correlation of short-time temporal envelopes of clean and noisy speech.
It follows these main steps:
-
Preprocessing
- Converts audio to mono and resamples to 10 kHz.
- Removes silent frames based on an energy threshold.
-
Time-Frequency Analysis
- Computes the Short-Time Fourier Transform (STFT).
- Groups STFT bins into one-third octave bands (mimicking human auditory perception).
-
Short-Time Segmentation
- Divides signals into overlapping 30-frame windows.
- Normalizes and clips noisy speech based on reference signal energy.
-
D-Matrix Computation
- Calculates frame-wise correlation between clean and noisy signals.
- Stores these correlations in a structured d-matrix.
-
Final STOI Score
- Averages all correlation values to get the final STOI intelligibility score.
The dataset used to test the metric is the CPC1 dataset, which includes noisy speech signals and corresponding intelligibility scores obtained from tests with human listeners.
Clone the repository:
git clone https://github.com/George-P-1/stoi_Metric.git
cd stoi_Metric
See requirements.txt for the necessary dependencies. To install the necessary dependencies, run:
pip install -r requirements.txt
Use neural networks to improve predictions (see neural networks project).
The dataset used to test STOI metric was provided by The Clarity Project. The official pystoi implementation was used for validation of this work.
Footnotes
-
Cees H. Taal, Richard C. Hendriks, Richard Heusdens, and Jesper Jensen. “An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech”. In: IEEE Transactions on Audio, Speech, and Language Processing 19.7 (Sept. 2011), pp. 2125–2136. doi: 10.1109/TASL.2011.2114881. ↩