Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Youtube Compatible Transcript #151

Open
rajeshkumaryadavdotcom opened this issue Oct 23, 2023 · 4 comments
Open

[Feature Request] Youtube Compatible Transcript #151

rajeshkumaryadavdotcom opened this issue Oct 23, 2023 · 4 comments

Comments

@rajeshkumaryadavdotcom
Copy link

Hi,

Thank you very much for whisper-jax, it is very useful.

I would like to request a feature on https://huggingface.co/spaces/sanchit-gandhi/whisper-jax after transcript is generated, I need to go to chatGPT and ask it to convert in that format which YouTube accepts.

Can you please enable one more radio option like transcribe, translate, YouTube subtitle and also you can have one more option to write YouTube video description based on transcript.

Regards,
Raj

@rajeshkumaryadavdotcom
Copy link
Author

Issue is chatGPT is not able to convert long video like 20 min based on your output to YouTube subtitle. It says "I apologize for any misunderstanding. Generating a large amount of text, such as full subtitles for a 19-minute video, is beyond the capabilities of this platform. However, I can help you generate a summary or key points from the video if you provide me with the specific details or time stamps of the sections you need assistance with. Please let me know how I can assist you further."

@sanchit-gandhi
Copy link
Owner

Hey @rajeshkumaryadavdotcom! Thanks for your interest in Whisper JAX and glad to hear it's a useful resource! The idea of the demo is that it's intended to be a demonstration (demo) of the Whisper model for speech transcription, rather than a fully-fledged meeting transcription tool. If you'd like to build these features on top of the demo, feel free to fork the space and add these new features on top! However, they're more along a product line than the ML demo this is purposed to be

@iGerman00
Copy link

Hello, @sanchit-gandhi. I appreciate your generosity in providing the HF Space to the public, it's a great resource for general quick transcription tasks, but also for using its API - although it is hidden in the UI. I'm replying to this issue since it's on the topic of YouTube transcriptions.

I'm working on a userscript (mod) for YouTube that can transcribe any video and display the subtitles in the player natively. I've attached a demo video. I've been able to transcribe videos up to 50 minutes long. yt-dlp sometimes fails in the Space, or it returns a 504 on longer videos, but it usually works after a few tries. As you said, it is a demo, so I'm fine with that. I still have some things to finish, but it would become a very useful tool that I always wanted to have. It seems like an ideal use case for this, and it helps me a lot to have better automatic captions than YouTube's.

I wanted to ask if this is acceptable for you? I understand that running TPUs like that must be costly, but I read that it's supported by Google's TRC programme, so I just wanted to confirm if it's okay. I might publish my project in the future to a userscript directory, making my project be used by more people - although I am not sure exactly how many, or I can just keep it for personal use, depending on how okay you are with it.

Thank you in advance.

Screen.Recording.2024-01-14.at.14.50.422.mp4

@iGerman00
Copy link

@rajeshkumaryadavdotcom If you are familiar with Node, I wrote a rough simple parser for the timestamped output of whisper-jax. You can modify it to suit your desired format:

const fs = require('fs');

function customFormatToJson(subtitleContent) {
    const subtitleBlocks = subtitleContent.split('\n'); // Assuming each subtitle is on a new line
    const jsonSubtitles = { events: [] };

    subtitleBlocks.forEach(block => {
        const timeTextSplit = block.split('] ');
        const timeRange = timeTextSplit[0].replace('[', '').split(' -> ');
        const startTime = customTimeToMs(timeRange[0]);
        const endTime = customTimeToMs(timeRange[1]);
        const text = timeTextSplit[1];

        jsonSubtitles.events.push({
            tStartMs: startTime,
            dDurationMs: endTime - startTime,
            segs: [{ utf8: text }]
        });
    });

    return jsonSubtitles;
}

function customTimeToMs(timeStr) {
    if (!timeStr || !timeStr.includes(":")) return 0;
    const [hoursMinSec, milli] = timeStr.split('.');
    // example: 15:22 570, if hours then 01:15:22 570
    const hours = hoursMinSec.length > 5 ? hoursMinSec.split(':')[0] : 0;
    const minutes = hoursMinSec.length > 5 ? hoursMinSec.split(':')[1] : hoursMinSec.split(':')[0];
    const seconds = hoursMinSec.length > 5 ? hoursMinSec.split(':')[2] : hoursMinSec.split(':')[1];
    const milliseconds = milli || 0;
    return parseInt(hours) * 3600000 + parseInt(minutes) * 60000 + parseInt(seconds) * 1000 + parseInt(milliseconds);
}

const srtContent = fs.readFileSync('jax-output-timestamps.txt', 'utf8');
const jsonSubtitles = customFormatToJson(srtContent);

console.log(JSON.stringify(jsonSubtitles, null, 2));

Currently, it takes in jax-output-timestamps.txt from the same directory as the script, and dumps the subtitles in YouTube's Timed Text API json3 format into the console, but it should be easy to modify it to your liking for example to output SRT or WebVTT text, or a file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants