AI PodcastifyAI is an application that transforms scientific papers and web content into engaging podcast-style conversations using artificial intelligence. This tool leverages advanced language models and text-to-speech technology to create informative and accessible audio content from complex textual information. This is poor man's version of Google's NotebookLM AI Podcast.
demo.mp4
sample-audio.mp4
- Text Input: Enter scientific text or a webpage URL directly into the application.
- AI-Powered Dialogue Generation: Utilizes KoboldCPP to generate a natural conversation between two speakers based on the input content.
- Text-to-Speech Conversion: Employs StyleTTS2 to convert the generated dialogue into lifelike speech.
- Multi-Voice Support: Creates a dynamic listening experience with distinct voices for different speakers.
- Audiobook Creation: Combines individual audio segments into a cohesive MP3 audiobook.
- User-Friendly GUI: Offers an intuitive interface for easy interaction and processing.
- Python 3.8+
- tkinter
- requests
- BeautifulSoup4
- StyleTTS2 API
- scipy
- pydub
- numpy
- tortoise-tts
-
Clone this repository:
git clone https://github.com/PasiKoodaa/ai-podcastify.git cd ai-podcastify -
Install the required dependencies:
pip install -r requirements.txt -
Prepare voice samples named
melinda_voice.wavandsteve_voice.wavfor the two speakers and put them in the same folder where the file "main.py" is.
-
Ensure you have KoboldCPP running locally on port 5001.
-
Run the application:
python main.py -
In the GUI:
- Enter the text of a scientific paper or a webpage URL in the input area.
- Click "Process" to generate the podcast dialogue.
- Once processing is complete, click "Create Audiobook" to generate the MP3 file.
-
The resulting audiobook will be saved as
audiobook.mp3in the same directory.
- Text Processing: The app fetches content from the provided text or URL.
- Dialogue Generation: KoboldCPP generates a conversational dialogue based on the input.
- Text-to-Speech: StyleTTS2 converts the dialogue into speech for each speaker.
- Audio Compilation: Individual audio segments are combined into a single MP3 file.
- Requires a local instance of KoboldCPP running on port 5001.
- Processing time may vary based on input length and system capabilities.
- Internet connection required for webpage content fetching.
- KoboldCPP for dialogue generation: https://github.com/LostRuins/koboldcpp
- StyleTTS2 API for text-to-speech conversion: https://github.com/NeuralVox/StyleTTS2 (At the moment this is really hard to get to work on Windows)
- All other open-source libraries used in this project
