stream output & file processing #1

kqvanity · 2024-10-30T11:03:24Z

Looking for a feature adjustment

[feature] Can continuous stream be a single chunk of text that gets updated with new transcriptions, instead of printing on new lines
[Bug] File processing doesn't seem to work. I pass an 8M flac audio file, so gstt takes it time, but output nothing at the end

giulianopz · 2024-10-31T07:47:18Z

No problem for the feature request: I've already thought to implement it and I'm going to do it in the next few days.

As to the problem you mentioned: could you please provide me with some more info to reproduce the issue? I would need at least the exact command you typed and the FLAC file itself (upload it somewhere, e.g. Google Drive). But bear in mind that if you don't pass the right sample rate of the audio file, the Google service won't be able to properly transcribe the input audio.

kqvanity · 2024-10-31T15:31:49Z

No problem for the feature request: I've already thought to implement it and I'm going to do it in the next few days.

Thanks for your time

bear in mind that if you don't pass the right sample rate of the audio file

I've tried to grab the sample rate using mediainfo then explicitly pass, but getting the following error

  gstt --sample-rate --file Using\ Wget\ As\ A\ Download\ Manager.flac
flag provided but not defined: -sample-rate
Usage:
    gstt [OPTION]... --interim --continuous [--file FILE]

Options:
        --verbose
        --file, path of audio file to trascript
        --key, api key built into chromium
        --language, language of the recording transcription, use the standard webcodes for your language, i.e. 'en-US' for English-US, 'ru' for Russian, etc. please, see https://en.wikipedia.org/wiki/IETF_language_tag
        --continuous, to keep the stream open and transcoding as long as there is no silence
        --interim, to send back results before its finished, so you get a live stream of possible transcriptions as it processes the audio
        --max-alts, how many possible transcriptions do you want
        --pfilter, profanity filter ('0'=off, '1'=medium, '2'=strict)
        --user-agent, user-agent for spoofing
        --sample-rate, audio sampling rate

Audio file

giulianopz · 2024-11-05T22:10:53Z

Hi @kqvanity,

there was a typo with the sample rate that is now fixed, sorry.

Anyway, I looked at your file and I found out that it is composed of two channels. You can convert it to 1 channel with:
ffmpeg -i afogr3.flac -ac 1 mono.flac.
I added a sidenote about it in the readme linking the ffmepg docs that explains this command: https://trac.ffmpeg.org/wiki/AudioChannelManipulation.

Then, use the new flag to write output on the same line as follows:
gstt --interim --continuous --subtitle-mode --file mono.flac

kqvanity · 2024-11-05T23:44:45Z

Great. it does work now.

Not aware of the complexity involved, but was wondering if it could be processed on the fly. Be it audio/video stream (dual-channel live streams which is only processed IRT). performance-wise, re-encoding would take place either way.

-subtitle-mode

Not available yet. Only continuous stream.

giulianopz · 2024-11-06T20:16:18Z

Make sure you are using the latest version (tag v0.1.0).

kqvanity · 2024-11-06T20:44:35Z

my bad. The binary was go-gsttt It does eliminate the extra noise, however, i faced two issues

the current raw text basically keeps overwriting the current terminal header.
reading currently running stream from rec, no longer outputs anything (i've tried it with a youtube video, flac file running using mpv)

giulianopz · 2024-11-14T08:10:01Z

hi @kqvanity, sorry for the late reply.

erasing the entire screen (including the terminal prompt) is the intended behavior. Why would you want to preserve it? It makes no sense to me.

Reading from live input (e.g.) still works indeed. Just wait a little more, because I am skipping non-final intermediate transcriptions. They are of little use IMO.

kqvanity · 2024-11-19T00:40:47Z

Why would you want to preserve it?

I'd want to aggregate the results, in order to have some context while reading.
Pipe the results to a text file or so

giulianopz · 2024-11-22T13:39:12Z

Actually you can still pipe the command output to a file, just remove the ANSI escape codes at the end of each transcription segment. Otherwise, just do not use that flag.

kqvanity · 2024-11-28T17:06:48Z

erasing the entire screen (including the terminal prompt) is the intended behavior.

I think it's a bit intrusive for it to clear the terminal. You could make it an optional flag.

giulianopz closed this as completed Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stream output & file processing #1

stream output & file processing #1

kqvanity commented Oct 30, 2024 •

edited

Loading

giulianopz commented Oct 31, 2024

kqvanity commented Oct 31, 2024

giulianopz commented Nov 5, 2024

kqvanity commented Nov 5, 2024 •

edited

Loading

giulianopz commented Nov 6, 2024

kqvanity commented Nov 6, 2024

giulianopz commented Nov 14, 2024

kqvanity commented Nov 19, 2024

giulianopz commented Nov 22, 2024

kqvanity commented Nov 28, 2024

stream output & file processing #1

stream output & file processing #1

Comments

kqvanity commented Oct 30, 2024 • edited Loading

giulianopz commented Oct 31, 2024

kqvanity commented Oct 31, 2024

giulianopz commented Nov 5, 2024

kqvanity commented Nov 5, 2024 • edited Loading

giulianopz commented Nov 6, 2024

kqvanity commented Nov 6, 2024

giulianopz commented Nov 14, 2024

kqvanity commented Nov 19, 2024

giulianopz commented Nov 22, 2024

kqvanity commented Nov 28, 2024

kqvanity commented Oct 30, 2024 •

edited

Loading

kqvanity commented Nov 5, 2024 •

edited

Loading