AI PodcastifyAI

AI PodcastifyAI is an application that transforms scientific papers and web content into engaging podcast-style conversations using artificial intelligence. This tool leverages advanced language models and text-to-speech technology to create informative and accessible audio content from complex textual information. This is poor man's version of Google's NotebookLM AI Podcast.

demo.mp4

sample-audio.mp4

Features

Text Input: Enter scientific text or a webpage URL directly into the application.
AI-Powered Dialogue Generation: Utilizes KoboldCPP to generate a natural conversation between two speakers based on the input content.
Text-to-Speech Conversion: Employs StyleTTS2 to convert the generated dialogue into lifelike speech.
Multi-Voice Support: Creates a dynamic listening experience with distinct voices for different speakers.
Audiobook Creation: Combines individual audio segments into a cohesive MP3 audiobook.
User-Friendly GUI: Offers an intuitive interface for easy interaction and processing.

Requirements

Python 3.8+
tkinter
requests
BeautifulSoup4
StyleTTS2 API
scipy
pydub
numpy
tortoise-tts

Installation

Clone this repository:

git clone https://github.com/PasiKoodaa/ai-podcastify.git
cd ai-podcastify

Install the required dependencies:
```
pip install -r requirements.txt
```
Prepare voice samples named melinda_voice.wav and steve_voice.wav for the two speakers and put them in the same folder where the file "main.py" is.

Usage

Ensure you have KoboldCPP running locally on port 5001.
Run the application:
```
python main.py
```
In the GUI:
- Enter the text of a scientific paper or a webpage URL in the input area.
- Click "Process" to generate the podcast dialogue.
- Once processing is complete, click "Create Audiobook" to generate the MP3 file.
The resulting audiobook will be saved as audiobook.mp3 in the same directory.

How It Works

Text Processing: The app fetches content from the provided text or URL.
Dialogue Generation: KoboldCPP generates a conversational dialogue based on the input.
Text-to-Speech: StyleTTS2 converts the dialogue into speech for each speaker.
Audio Compilation: Individual audio segments are combined into a single MP3 file.

Limitations

Requires a local instance of KoboldCPP running on port 5001.
Processing time may vary based on input length and system capabilities.
Internet connection required for webpage content fetching.

Acknowledgments

KoboldCPP for dialogue generation: https://github.com/LostRuins/koboldcpp
StyleTTS2 API for text-to-speech conversion: https://github.com/NeuralVox/StyleTTS2 (At the moment this is really hard to get to work on Windows)
All other open-source libraries used in this project

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI PodcastifyAI

Features

Requirements

Installation

Usage

How It Works

Limitations

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

PasiKoodaa/AI-PodcastifyAI

Folders and files

Latest commit

History

Repository files navigation

AI PodcastifyAI

Features

Requirements

Installation

Usage

How It Works

Limitations

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages