Skip to content

nordicsushi/async-youtube-transcript

Repository files navigation

Async YouTube Transcript (youscript)

An asyncio-based YouTube transcript fetcher that retrieves multiple videos concurrently, exposing both a Python API and a small CLI.

Features

  • Concurrent transcript fetching for many videos
  • Manual and auto-generated caption support
  • Simple Python API with async context manager
  • Minimal CLI for quick one-off usage

Requirements

  • Python 3.12+
  • uv (for environment and workflows)

Installation (uv)

  • Create/sync the project environment:

    • uv sync
  • Ensure the package is importable (editable install) if needed in your workflow:

    • uv pip install -e .

Quick Start

CLI

  • Fetch English transcripts for a single video and save as text:
    • uv run ust "https://www.youtube.com/watch?v=VIDEO_ID" -l en -s txt

Options

  • -l/--languages: repeatable, default en
  • -s/--save-as: output format, default txt
  • -p/--proxy: proxy URL (optional)

Python API

import asyncio
from youscript.client import YouScript

urls = [
    "https://www.youtube.com/watch?v=HyzlYwjoXOQ",
    "https://www.youtube.com/watch?v=0VLAoVGf_74",
]

async def main():
    async with YouScript() as client:
        results = await client.get_transcripts(urls, languages=["en"])
        for url, transcripts in results.items():
            print(url, "->", len(transcripts), "transcript variants")

asyncio.run(main())

Benchmarking

  • Summary: the async client is substantially faster than the non-async youtube-transcript-api for batch fetching.
  • Why: transcript retrieval is network-bound; async overlaps I/O so total time approaches the slowest single request, not the sum of all.

Reproduce locally with uv

  • Install deps and run the benchmark in one shot:
    • uv run --with youtube-transcript-api scripts/benchmark.py
  • The script prints two lines:
    • Async time elapsed: <seconds>
    • Sequential time elapsed: <seconds>

Snapshot (example run)

Notes

  • Sequential baseline: youtube-transcript-api .fetch(...) called one-by-one.
  • Async path: our client fetches concurrently using asyncio.TaskGroup.
  • Exact speedup varies with network/YouTube latency; expect notable gains on batches.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages