Skip to content

Voice Activity Detection in R using the "Silero" model

License

Notifications You must be signed in to change notification settings

bnosac/audio.vadsilero

Folders and files

NameName
Last commit message
Last commit date
Jan 29, 2024
Mar 18, 2024
Jan 29, 2024
Mar 18, 2024
Jan 29, 2024
Jan 29, 2024
Jan 29, 2024
Mar 18, 2024
Jan 29, 2024
Mar 18, 2024
Mar 18, 2024
Jan 29, 2024
Jan 29, 2024

Repository files navigation

audio.vadsilero

This repository contains an R package which is a torch R wrapper around the Silero Voice Activity Detection model.

The package was created with as main goal to remove non-speech audio segments before doing an automatic transcription using audio.whisper to avoid transcription hallucinations. It contains

  • functions to detect the location of voice in audio using a the .jit model from Silero implemented in torch
  • it has similar functionality as audio.vadwebrtc

Installation

  • The package is currently not on CRAN
  • For the development version of this package: remotes::install_github("bnosac/audio.vadsilero")

Look to the documentation of the functions: help(package = "audio.vadsilero")

Example

Get a audio file in 16 bit with mono PCM samples (pcm_s16le codec) with a sampling rate of either 8Khz, 16KHz or 32Khz

library(audio.vadsilero)
file <- system.file(package = "audio.vadsilero", "extdata", "test_wav.wav")
vad  <- silero(file, milliseconds = 96)
vad
Voice Activity Detection 
  - file: D:/Jan/R/win-library/4.1/audio.vadsilero/extdata/test_wav.wav 
  - sample rate: 16000 
  - VAD type: silero, VAD mode: , VAD by milliseconds: 96, VAD frame_length: 1536
    - Percent of audio containing a voiced signal: 76.6% 
    - Seconds voiced: 4.7 
    - Seconds unvoiced: 1.4 
vad$vad_segments
 vad_segment start   end has_voice
           1 0.000 0.672     FALSE
           2 0.768 2.592      TRUE
           3 2.688 2.688     FALSE
           4 2.784 3.168      TRUE
           5 3.264 3.648     FALSE
           6 3.744 4.896      TRUE
           7 4.992 4.992     FALSE
           8 5.088 6.432      TRUE
           9 6.528 6.912     FALSE

Example of a simple plot of these audio and voice segments

library(wav)
x <- read_wav(file)
plot(seq_along(x) / 16000, x, type = "l", xlab = "Seconds", ylab = "Signal")
lines(vad$vad$millisecond / 1000, vad$vad$probability *  max(x), type = "l", col = "red", lwd = 2)

Support in text mining

Need support in text mining? Contact BNOSAC: http://www.bnosac.be

About

Voice Activity Detection in R using the "Silero" model

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages