IAB label classification is a task to classify audio clips into different IAB labels. We used the Audioset dataset to train this model; more information on how we created the dataset can be found here. In this codebase, we fine-tune PANNs based on how this model was fine tuned to build an audio clip classification system.
View our full report here on how we created the dataset and how we created our model here
View our datasets and our Audioset labels to IAB labels mappings (for both 5 and 20 labels) here
We also expanded this model to use 20 IAB labels here
View the notebook here if you would like to run it yourself
1. Requirements
python 3.6 + pytorch 1.0
2. Then simply run:
$ Run the bash script ./runme.sh
Or run the commands in runme.sh line by line. The commands includes:
(1) Modify the paths of dataset and your workspace
(2) Extract features
(3) Train model
A 14-layer CNN of PANNs is fine-tuned. We use 10-fold cross validation for IAB label classification. That is, 900 audio clips are used for training, and 100 audio clips are used for validation.
[1] Kong, Qiuqiang, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, and Mark D. Plumbley. "PANNs: Large-scale pretrained audio neural networks for audio pattern recognition." arXiv preprint arXiv:1912.10211 (2019).