Open
Description
I've wrote this week audio.vadwebrtc as I needed a quick way to remove from audio files segments without voice as I need to transcribe audio files with audio.whisper and that model hallucinates on audio segments containing only silences.
I looked at this chunk of code in package av:
Lines 652 to 670 in 58d7026
and it only handles one start_time / total_time.
My code looks like this to extract from an audio file only the part containing voice.
> library(av)
> library(audio.vadwebrtc)
> file <- system.file(package = "audio.vadwebrtc", "extdata", "test_wav.wav")
> vad <- VAD(file, mode = "normal")
> vad$vad_segments
vad_segment start end has_voice
1 1 0.00 0.08 FALSE
2 2 0.09 3.30 TRUE
3 3 3.31 3.71 FALSE
4 4 3.72 6.78 TRUE
5 5 6.79 6.99 FALSE
>
> voiced <- subset(vad$vad_segments, vad$vad_segments$has_voice == TRUE)
> voiced$file <- sprintf("%s.wav", voiced$vad_segment)
> voiced
vad_segment start end has_voice file
2 2 0.09 3.30 TRUE 2.wav
4 4 3.72 6.78 TRUE 4.wav
> for(i in seq_len(nrow(voiced))){
+ av_audio_convert(file, output = voiced$file[i],
+ start_time = voiced$start[i],
+ total_time = voiced$end[i] - voiced$start[i])
+ }
Output #0, wav, to 'D:\Jan\Dropbox\Work\RForgeBNOSAC\BNOSAC\audio.vadwebrtc\2.wav':
Metadata:
ISFT : Lavf58.29.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Adding audio frame 28 at timestamp 3.42sec - audio stream completed!
Output #0, wav, to 'D:\Jan\Dropbox\Work\RForgeBNOSAC\BNOSAC\audio.vadwebrtc\4.wav':
Metadata:
ISFT : Lavf58.29.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Adding audio frame 26 at timestamp 6.79sec - audio stream completed!
>
Would it be possible technically to allow multiple start/total_times so that these are all combined in 1 file? So that I can write something like this: av_audio_convert(file, output = "test.wav", start_time = voiced$start, total_time =voiced$end - voiced$start)
, generating 1 output file?
Metadata
Metadata
Assignees
Labels
No labels