RAGTube is an open-source project developed by Paco that transforms YouTube video content into a structured knowledge base optimized for Retrieval Augmented Generation (RAG) models. This tool automates the extraction of video metadata, descriptions, and transcripts, compiling them into a format that enriches small language models with precise, context-rich information tailored for specific tasks.
- Automated Video Data Extraction: Efficiently pulls metadata, descriptions, and transcripts from YouTube.
- Dockerized Application Architecture: Utilizes separate Docker containers for scraping YouTube data and for managing the RAG.
- Scalable and Customizable: Designed to handle large datasets and adaptable to specific user needs.
- Seamless RAG Integration: Provides structured data ready to be utilized by RAG models for improved data retrieval.
RAGTube is containerized in Docker to simplify deployment and ensure consistency across different environments. Here's how to get it running:
- Docker
- Docker Compose
Detailed instructions on setting up and using RAGTube are available in the /docs
directory. These documents provide comprehensive guidelines on deploying Docker containers, configuring the system, and executing the scripts within the Dockerized environment.
We are excited to welcome new contributors! If you're interested in improving RAGTube, please take a look at the CONTRIBUTING.md
for our code of conduct and contribution guidelines. Join us in enhancing and expanding this project!
This project is licensed under the MIT License - see the LICENSE.md file for details.
- Paco - Initial work - bepitic
- Thanks to everyone who has contributed to open-source projects that inspired this work.
For support, feedback, or inquiries, please open an issue in this repository.