Concurrent Speakers Counter

Estimate the number of concurrent speakers from single channel mixtures to crack the "cocktail-party” problem which is based on a Bidirectional Long Short-Term Memory (BLSTM) which takes into account a past and future temporal context.

1. The model of the paper

Layer	Layer Name	Input Shape	Output Shape
the First Layer	BLSTM_1	(?, 500, 201)	(?, 500, 60)
the Second Layer	BLSTM_2	(?, 500, 60)	(?, 500, 40)
the Third Layer	BLSTM_3	(?, 500, 40)	(?, 500, 80)
the Fourth Layer	maxpooling1d	(?, 500, 80)	(?, 250, 80)
the Fifth Layer	flatten	(?, 250, 80)	(?, 20000)
the Sixth Layer	dense	(?, 20000)	(?, 11)
the Seventh Layer	activation	(?, 11)	(?, 11)

"?" represents the number of samples.

2. My Model

3. Dependency Library

librosa
soundfile
Keras (my test version: 2.1.1)
Tensorflow (my test version: 1.4.0)
Anaconda3 (Contains Python3.5+)

4. Dataset

It is called LibriCount10 0dB Dataset.

contains a simulated cocktail party environment of [0..10] speakers
mixed with 0dB SNR
5 seconds of recording
16bits, 16kHz, mono
11440 Samples, 832.5 MB

The annotation provides information about the speakers sex, their unique speaker_id, and vocal activity within the mixture recording in samples. The format of json file (3 speakers) is as follows:

[
    {
        "sex": "F",
        "activity": [[0, 51076], [51396, 55400], [56681, 80000]], 
        "speaker_id": 1221
    },
    {
        "sex": "F",
        "activity": [[0, 51877], [56201, 80000]],
        "speaker_id": 3570
    },
    {
        "sex": "M",
        "activity": [[0, 15681], [16161, 68213], [73498, 80000]], 
        "speaker_id": 5105
    }
]

5. Reference Paper

As we all know, it's pretty hard to solve the cocktail-party problem. This is the ﬁrst study on data-driven speaker count estimation and the first step to crack the problem. Thanks for the author's paper[Paper 2] and code which help me a lot. Their homepage is AudioLabs Erlangen CountNet.

Paper 1: Simon Leglaive, Romain Hennequin and Roland Badeau. Singing voice detection with deep recurrent neural networks (ICASSP 2015).
Paper 2: Fabian-Robert Stöter, Soumitro Chakrabarty, Bernd Edler and Emanuël A. P. Habets. Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation (ICASSP2018).

6. Recommended links

6. Follow-up Work

I will work on speech separation for a long time. You can fork this repository if interested and pay close attention to my recent study.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
model_of_paper		model_of_paper
pictures		pictures
references		references
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
prepare_dataset.ipynb		prepare_dataset.ipynb
prepare_dataset.py		prepare_dataset.py
train_cnn.ipynb		train_cnn.ipynb
train_stft_cnn.py		train_stft_cnn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Concurrent Speakers Counter

1. The model of the paper

2. My Model

3. Dependency Library

4. Dataset

5. Reference Paper

6. Recommended links

6. Follow-up Work

About

Releases

Packages

Languages

aishoot/Concurrent_Speakers_Counter

Folders and files

Latest commit

History

Repository files navigation

Concurrent Speakers Counter

1. The model of the paper

2. My Model

3. Dependency Library

4. Dataset

5. Reference Paper

6. Recommended links

6. Follow-up Work

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages