Philipp Götz1, Georg Götz2, Nils Meyer-Kahlen2, Kyung Yun Lee2, Karolina Prawda2, Emanuël A. P. Habets1, and Sebastian J. Schlecht3
1International Audio Laboratories Erlangen, Germany
2Acoustics Lab, Dpt. of Information and Communications Engineering, Aalto University, Finland
3Friedrich-Alexander-University Erlangen-Nuremberg (FAU), Germany
The multimodal dataset described in the IWAENC 2024 publication, including room impulse responses (RIRs) and 360˚-photos of each measurement position, is hosted at Zenodo (https://zenodo.org/records/11388246).
This is the accompanying code repository for the blind energy decay function (EDF) estimation method proposed in the paper and includes the source code of the model, the training routines and some basic visualizing of the results.
The Python packages required to run the code can be installed using the requirements.txt file
pip install -f requirements.txt
The model training and evaluation data is hosted at a Google-Drive and downloaded by running download.sh
. Depending on wether a GPU is available or not, a softlink to cpu.yaml
or gpu.yaml
can be created in configs/local
.
The pre-generated dataset from the Google-Drive is constructed as described in the paper. In a preliminary step the blind T60 estimator is trained by running
python src/train.py -cn train model=baseline_t60 hydra=baseline_t60
Upon convergence, the trained model is used as a baseline which computes linear EDCs from blind T60 estimates. As an additional, non-blind baseline (not based on speech but on RIRs directly), a pre-trained DecayFitNet is used to generate multi-slope EDCs. The blind EDC estimator is trained using
python src/train.py -cn train model=baseline_edc
More information on DecayFitNet can be found at https://github.com/georg-goetz/DecayFitNet.