An asyncio-based YouTube transcript fetcher that retrieves multiple videos concurrently, exposing both a Python API and a small CLI.
- Concurrent transcript fetching for many videos
- Manual and auto-generated caption support
- Simple Python API with async context manager
- Minimal CLI for quick one-off usage
- Python 3.12+
- uv (for environment and workflows)
-
Create/sync the project environment:
uv sync
-
Ensure the package is importable (editable install) if needed in your workflow:
uv pip install -e .
- Fetch English transcripts for a single video and save as text:
uv run ust "https://www.youtube.com/watch?v=VIDEO_ID" -l en -s txt
Options
-l/--languages: repeatable, defaulten-s/--save-as: output format, defaulttxt-p/--proxy: proxy URL (optional)
import asyncio
from youscript.client import YouScript
urls = [
"https://www.youtube.com/watch?v=HyzlYwjoXOQ",
"https://www.youtube.com/watch?v=0VLAoVGf_74",
]
async def main():
async with YouScript() as client:
results = await client.get_transcripts(urls, languages=["en"])
for url, transcripts in results.items():
print(url, "->", len(transcripts), "transcript variants")
asyncio.run(main())- Summary: the async client is substantially faster than the non-async
youtube-transcript-apifor batch fetching. - Why: transcript retrieval is network-bound; async overlaps I/O so total time approaches the slowest single request, not the sum of all.
Reproduce locally with uv
- Install deps and run the benchmark in one shot:
uv run --with youtube-transcript-api scripts/benchmark.py
- The script prints two lines:
Async time elapsed: <seconds>Sequential time elapsed: <seconds>
Snapshot (example run)
Notes
- Sequential baseline:
youtube-transcript-api.fetch(...)called one-by-one. - Async path: our client fetches concurrently using
asyncio.TaskGroup. - Exact speedup varies with network/YouTube latency; expect notable gains on batches.
