Releases · pyannote/pyannote-audio

23 Jun 00:30

hbredin

3.3.1

4dd55a5

Version 3.3.1 Latest

Latest

Breaking changes

setup: drop support for Python 3.8

Fixes

fix: fix support for numpy==2.x (@ibevers)
fix: fix support for speechbrain==1.x (@Adel-Moumen)

Assets 2

14 Jun 08:41

hbredin

3.3.0

adaf770

Version 3.3.0

TL;DR

pyannote.audio does speech separation: multi-speaker audio in, one audio channel per speaker out!

pip install pyannote.audio[separation]==3.3.0

New features

feat(task): add PixIT joint speaker diarization and speech separation task (with @joonaskalda)
feat(model): add ToTaToNet joint speaker diarization and speech separation model (with @joonaskalda)
feat(pipeline): add SpeechSeparation pipeline (with @joonaskalda)
feat(io): add option to select torchaudio backend

Fixes

fix(task): fix wrong train/development split when training with (some) meta-protocols (#1709)
fix(task): fix metadata preparation with missing validation subset (@clement-pages)

Improvements

improve(io): when available, default to using soundfile backend
improve(pipeline): do not extract embeddings when max_speakers is set to 1
improve(pipeline): optimize memory usage of most pipelines (#1713 by @benniekiss)

Assets 2

08 May 09:51

hbredin

3.2.0

70a8507

Version 3.2.0

New features

feat(task): add option to cache task training metadata to speed up training (with @clement-pages)
feat(model): add receptive_field, num_frames and dimension to models (with @Bilal-Rahou)
feat(model): add fbank_only property to WeSpeaker models
feat(util): add Powerset.permutation_mapping to help with permutation in powerset space (with @FrenchKrab)
feat(sample): add sample file at pyannote.audio.sample.SAMPLE_FILE
feat(metric): add reduce option to diarization_error_rate metric (with @Bilal-Rahou)
feat(pipeline): add Waveform and SampleRate preprocessors

Fixes

fix(task): fix random generators and their reproducibility (with @FrenchKrab)
fix(task): fix estimation of training set size (with @FrenchKrab)
fix(hook): fix torch.Tensor support in ArtifactHook
fix(doc): fix typo in Powerset docstring (with @lukasstorck)

Improvements

improve(metric): add support for number of speakers mismatch in diarization_error_rate metric
improve(pipeline): track both Model and nn.Module attributes in Pipeline.to(device)
improve(io): switch to torchaudio >= 2.2.0
improve(doc): update tutorials (with @clement-pages)

Breaking changes

BREAKING(model): get rid of Model.example_output in favor of num_frames method, receptive_field property, and dimension property
BREAKING(task): custom tasks need to be updated (see "Add your own task" tutorial)

Community contributions

community: add tutorial for offline use of pyannote/speaker-diarization-3.1 (by @simonottenhauskenbun)

Assets 2

01 Dec 13:26

hbredin

3.1.1

6a972c0

Version 3.1.1

TL;DR

Providing num_speakers to pyannote/speaker-diarization-3.1 now works as expected.

Full changelog

Fixes

fix(pipeline): fix support for setting num_speakers in pyannote/speaker-diarization-3.1 pipeline

Assets 2

16 Nov 12:37

hbredin

3.1.0

f45da71

Version 3.1.0

TL;DR

pyannote/speaker-diarization-3.1 no longer requires unpopular ONNX runtime

Full changelog

New features

feat(model): add WeSpeaker embedding wrapper based on PyTorch
feat(model): add support for multi-speaker statistics pooling
feat(pipeline): add TimingHook for profiling processing time
feat(pipeline): add ArtifactHook for saving internal steps
feat(pipeline): add support for list of hooks with Hooks
feat(utils): add "soft" option to Powerset.to_multilabel

Fixes

fix(pipeline): add missing "embedding" hook call in SpeakerDiarization
fix(pipeline): fix AgglomerativeClustering to honor num_clusters when provided
fix(pipeline): fix frame-wise speaker count exceeding max_speakers or detected num_speakers in SpeakerDiarization pipeline

Improvements

improve(pipeline): compute fbank on GPU when requested

Breaking changes

BREAKING(pipeline): rename WeSpeakerPretrainedSpeakerEmbedding to ONNXWeSpeakerPretrainedSpeakerEmbedding
BREAKING(setup): remove onnxruntime dependency.
You can still use ONNX hbredin/wespeaker-voxceleb-resnet34-LM but you will have to install onnxruntime yourself.
BREAKING(pipeline): remove logging_hook (use ArtifactHook instead)
BREAKING(pipeline): remove onset and offset parameter in SpeakerDiarizationMixin.speaker_count
You should now binarize segmentations before passing them to speaker_count

Assets 2

28 Sep 19:47

hbredin

3.0.1

28fcf50

Version 3.0.1

TL;DR

pyannote/speaker-diarization-3.0 is now much faster when sent to GPU.

import torch
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.0")
pipeline.to(torch.device("cuda"))

Full changelog

Fixes and improvements

fix: fix WeSpeakerPretrainedSpeakerEmbedding GPU support

Dependencies update

setup: switch from onnxruntime to onnxruntime-gpu

Assets 2

0 Join discussion

26 Sep 13:00

hbredin

3.0.0

795b92a

Version 3.0.0

TL;DR

Better pretrained pipeline and model

Much better overlapping speech detection with powerset pyannote/segmentation-3.0
Much better speaker diarization performance with pyannote/speaker-diarization-3.0

Benchmark (DER %)	v2.1	v3.0
AISHELL-4	14.1	12.3
AliMeeting (channel 1)	27.4	24.3
AMI (IHM)	18.9	19.0
AMI (SDM)	27.1	22.2
AVA-AVD	-	49.1
DIHARD 3 (full)	26.9	21.7
MSDWild	-	24.6
REPERE (phase2)	8.2	7.8
VoxConverse (v0.3)	11.2	11.3

Major breaking changes

BREAKING: pipelines now run on CPU by default
Use pipeline.to(torch.device('cuda')) to use GPU
BREAKING: removed SpeakerSegmentation pipeline
Use SpeakerDiarization pipeline instead
BREAKING: removed support for prodi.gy recipes

Full changelog

Features and improvements

feat(pipeline): send pipeline to device with pipeline.to(device)
feat(pipeline): add return_embeddings option to SpeakerDiarization pipeline
feat(pipeline): make segmentation_batch_size and embedding_batch_size mutable in SpeakerDiarization pipeline (they now default to 1)
feat(pipeline): add progress hook to pipelines
feat(task): add powerset support to SpeakerDiarization task
feat(task): add support for multi-task models
feat(task): add support for label scope in speaker diarization task
feat(task): add support for missing classes in multi-label segmentation task
feat(model): add segmentation model based on torchaudio self-supervised representation
feat(pipeline): check version compatibility at load time
improve(task): load metadata as tensors rather than pyannote.core instances
improve(task): improve error message on missing specifications

Breaking changes

BREAKING(task): rename Segmentation task to SpeakerDiarization
BREAKING(pipeline): pipeline defaults to CPU (use pipeline.to(device))
BREAKING(pipeline): remove SpeakerSegmentation pipeline (use SpeakerDiarization pipeline)
BREAKING(pipeline): remove segmentation_duration parameter from SpeakerDiarization pipeline (defaults to duration of segmentation model)
BREAKING(task): remove support for variable chunk duration for segmentation tasks
BREAKING(pipeline): remove support for FINCHClustering and HiddenMarkovModelClustering
BREAKING(setup): drop support for Python 3.7
BREAKING(io): channels are now 0-indexed (used to be 1-indexed)
BREAKING(io): multi-channel audio is no longer downmixed to mono by default.
You should update how pyannote.audio.core.io.Audio is instantiated:
- replace Audio() by Audio(mono="downmix");
- replace Audio(mono=True) by Audio(mono="downmix");
- replace Audio(mono=False) by Audio().
BREAKING(model): get rid of (flaky) Model.introspection
If, for some weird reason, you wrote some custom code based on that,
you should instead rely on Model.example_output.
BREAKING(interactive): remove support for Prodigy recipes

Fixes and improvements

fix(pipeline): fix reproducibility issue with Ampere CUDA devices
fix(pipeline): fix support for IOBase audio
fix(pipeline): fix corner case with no speaker
fix(train): prevent metadata preparation to happen twice
fix(task): fix support for "balance" option
improve(task): shorten and improve structure of Tensorboard tags

Dependencies update

setup: switch to torch 2.0+, torchaudio 2.0+, soundfile 0.12+, lightning 2.0+, torchmetrics 0.11+
setup: switch to pyannote.core 5.0+, pyannote.database 5.0+, and pyannote.pipeline 3.0+
setup: switch to speechbrain 0.5.14+

Assets 2

0 Join discussion

31 Jan 13:50

hbredin

2.1.1

460f7e7

Version 2.1.1

Version 2.1.x introduces a major overhaul of pyannote.audio default speaker diarization pipeline, made of three main stages:

neural speaker segmentation applied to a short sliding window;
neural speaker embedding of each (local) speakers;
(global) agglomerative clustering.

More details in the attached technical report.

Assets 3

25 Nov 08:48

hbredin

1.1.1

c5de4f2

Version 1.1.1

chore: do not update to pyannote.pipeline >= 2.0

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breaking changes

Fixes

TL;DR

New features

Fixes

Improvements

New features

Fixes

Improvements

Breaking changes

Community contributions

TL;DR

Full changelog

Fixes

TL;DR

Full changelog

New features

Fixes

Improvements

Breaking changes

TL;DR

Full changelog

Fixes and improvements

Dependencies update

TL;DR

Better pretrained pipeline and model

Major breaking changes

Full changelog

Features and improvements

Breaking changes

Fixes and improvements

Dependencies update

Releases: pyannote/pyannote-audio

Version 3.3.1

Breaking changes

Fixes

Version 3.3.0

TL;DR

New features

Fixes

Improvements

Version 3.2.0

New features

Fixes

Improvements

Breaking changes

Community contributions

Version 3.1.1

TL;DR

Full changelog

Fixes

Version 3.1.0

TL;DR

Full changelog

New features

Fixes

Improvements

Breaking changes

Version 3.0.1

TL;DR

Full changelog

Fixes and improvements

Dependencies update

Version 3.0.0

TL;DR

Better pretrained pipeline and model

Major breaking changes

Full changelog

Features and improvements

Breaking changes

Fixes and improvements

Dependencies update

Version 2.1.1

Version 1.1.1