av_audio_convert multiple start_time/total_time

I've wrote this week [audio.vadwebrtc](https://github.com/bnosac/audio.vadwebrtc) as I needed a quick way to remove from audio files segments without voice as I need to transcribe audio files with [audio.whisper](https://github.com/bnosac/audio.whisper) and that model hallucinates on audio segments containing only silences.

I looked at this chunk of code in package av: https://github.com/ropensci/av/blob/58d702683261d23fa7620a42aabfe776705b50a7/src/video.c#L652-L670
and it only handles one start_time / total_time.
My code looks like this to extract from an audio file only the part containing voice.

```
> library(av)
> library(audio.vadwebrtc)
> file <- system.file(package = "audio.vadwebrtc", "extdata", "test_wav.wav")
> vad <- VAD(file, mode = "normal")
> vad$vad_segments
  vad_segment start  end has_voice
1           1  0.00 0.08     FALSE
2           2  0.09 3.30      TRUE
3           3  3.31 3.71     FALSE
4           4  3.72 6.78      TRUE
5           5  6.79 6.99     FALSE
> 
> voiced <- subset(vad$vad_segments, vad$vad_segments$has_voice == TRUE)
> voiced$file <- sprintf("%s.wav", voiced$vad_segment)
> voiced
  vad_segment start  end has_voice  file
2           2  0.09 3.30      TRUE 2.wav
4           4  3.72 6.78      TRUE 4.wav
> for(i in seq_len(nrow(voiced))){
+     av_audio_convert(file, output = voiced$file[i], 
+                      start_time = voiced$start[i], 
+                      total_time = voiced$end[i] - voiced$start[i])
+ }
Output #0, wav, to 'D:\Jan\Dropbox\Work\RForgeBNOSAC\BNOSAC\audio.vadwebrtc\2.wav':
  Metadata:
    ISFT            : Lavf58.29.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Adding audio frame 28 at timestamp 3.42sec - audio stream completed!
Output #0, wav, to 'D:\Jan\Dropbox\Work\RForgeBNOSAC\BNOSAC\audio.vadwebrtc\4.wav':
  Metadata:
    ISFT            : Lavf58.29.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Adding audio frame 26 at timestamp 6.79sec - audio stream completed!
>
```

Would it be possible technically to allow multiple start/total_times so that these are all combined in 1 file? So that I can write something like this: `av_audio_convert(file, output = "test.wav", start_time = voiced$start, total_time =voiced$end - voiced$start)`, generating 1 output file?

	SEXP R_convert_audio(SEXP audio, SEXP out_file, SEXP out_format, SEXP out_channels,
	SEXP sample_rate, SEXP start_pos, SEXP max_len){
	output_container *output = av_mallocz(sizeof(output_container));
	if(Rf_length(out_channels))
	output->channels = Rf_asInteger(out_channels);
	if(Rf_length(sample_rate))
	output->sample_rate = Rf_asInteger(sample_rate);
	if(Rf_length(out_format))
	output->format_name = CHAR(STRING_ELT(out_format, 0));
	output->audio_input = open_audio_input(CHAR(STRING_ELT(audio, 0)));
	double start_pts = Rf_length(start_pos) ? Rf_asReal(start_pos) : 0;
	if(start_pts > 0)
	av_seek_frame(output->audio_input->demuxer, -1, start_pts * AV_TIME_BASE, AVSEEK_FLAG_ANY);
	if(Rf_length(max_len))
	output->max_pts = (Rf_asReal(max_len) + start_pts) * AV_TIME_BASE;
	output->output_file = CHAR(STRING_ELT(out_file, 0));
	R_UnwindProtect(encode_audio_input, output, close_output_file, output, NULL);
	return out_file;
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

av_audio_convert multiple start_time/total_time #52

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

av_audio_convert multiple start_time/total_time #52

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions