Wrong usage of meta-protocols subsets in segmentation tasks #1709

FrenchKrab · 2024-05-15T05:37:11Z

Tested versions

Reproducible in 3.2.0, tested with a73ded2

System information

Linux / pyannote.audio 3.12 / pyannote.database 5.1.0 / Python 3.12

Issue description

In the mixins of the segmentation task, filtering is done using self.prepared_data["audio-metadata"]["subset"] == Subsets.index("train").
This works perfectly with normal protocols, but with meta-protocols, it seems to rely on the "original" subset, not the meta one.

For example in meta protocol:

Protocols:
  X:
    SpeakerDiarization:
      MyMETA:
        train:
          MyProtocol.SpeakerDiarization.A: ['development']
        development:
          MyProtocol.SpeakerDiarization.A: ['development']

the 'train' subset will be considered empty (and pyannote will throw errors).

I haven't tested, but I suppose it "fails silently" (i.e. ignore the set) in other cases where there is data to train on:

Protocols:
  X:
    SpeakerDiarization:
      MyMETA:
        train:
          SomeOtherProtocol.SpeakerDiarization.A: ['train']
          MyProtocol.SpeakerDiarization.A: ['development']
        development:
          MyProtocol.SpeakerDiarization.A: ['development']

Minimal reproduction example (MRE)

https://colab.research.google.com/drive/1kCy30rYG8fWltJfc_xPuX8AdL28y1gMc?usp=sharing

The text was updated successfully, but these errors were encountered:

hbredin · 2024-05-15T05:39:46Z

@clement-pages any idea?

…1709)

hbredin added a commit that referenced this issue May 15, 2024

fix: fix #1709

535c37a

hbredin mentioned this issue May 15, 2024

fix: fix #1709 #1710

Merged

hbredin added a commit that referenced this issue May 17, 2024

fix(task): fix incorrect train/dev split with (some) meta-protocols (#…

cad8bea

…1709)

FrenchKrab closed this as completed May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong usage of meta-protocols subsets in segmentation tasks #1709

Wrong usage of meta-protocols subsets in segmentation tasks #1709

FrenchKrab commented May 15, 2024

hbredin commented May 15, 2024

Wrong usage of meta-protocols subsets in segmentation tasks #1709

Wrong usage of meta-protocols subsets in segmentation tasks #1709

Comments

FrenchKrab commented May 15, 2024

Tested versions

System information

Issue description

Minimal reproduction example (MRE)

hbredin commented May 15, 2024