Transform your favorite book chapters into stunning narrated video stories with AI-powered automation.
Teller of Tales is an intelligent automation system that converts written book chapters into professional-quality narrated videos. By leveraging natural language processing, advanced language models, and AI image generation, the project creates compelling visual narratives with synchronized voiceovers, background music, and text overlays—all in a fully automated, scalable pipeline.
This tool enables content creators, educational platforms, and storytellers to generate engaging video content at scale, reducing production time from hours to minutes while maintaining high quality output.
- NLP with OpenAI, Ollama or KeyBERT
- Image generation with StableDiffusion
- Text to speech with Edge Text-to-Speech or Elevenlabs
- Video editing with MoviePy
The.Red.Rose.and.The.Black.Rose.Short.mp4
- Input File: projects/[project_name]/story.txt
- Action: User provides a text file containing a chapter.
- Output: Folder structure initialized for project.
Components:
-
Components involved in splitting text:
-
Text Storage → Text Splitter → Sentence Fragmentator
-
Text Storage: Loads story.txt using read_file().
-
Text Cleaner: Uses clean_text() to normalize text (remove special chars).
-
Sentence Splitter: Uses NLTK sent_tokenize() to split into sentences.
-
Fragment Aggregator: Combines sentences into ~N-word fragments (FRAGMENT_LENGTH) for manageable processing.
The following steps run in parallel per fragment (managing CPU/memory via process pools):
sequenceDiagram
User->>TextFragment: Process fragment
TextFragment->>TTS: Generate audio
TextFragment->>PromptEngine: Create prompt
TextFragment->>ImageGen: Generate image
loop Per fragment
TTS->>AudioFile: Save WAV/MP3
PromptEngine->>PromptFile: Save prompt text
ImageGen->>ImageFile: Save JPG
end
A. Text-to-Speech (TTS)
- Engines:
- Edge TTS: Async via edge_tts.Communicate (default)
- ElevenLabs: Synthesizes via API if configured
- Process:
- Audio generated for each fragment.
- Saves as audio/voiceover{i}.mp3 or .wav.
B. Prompt Generation Strategies:
- LLM-Based:
- ChatGPT: Asks "Craft a visual prompt from this scene".
- Ollama: Offline LLM for prompt generation.
- KeyBERT (fallback):
- Keyword extraction (NLTK + KeyBERT) if LLM fails.
- Output: Saved to text/image-prompts/image_prompt{i}.txt.
C. Stable Diffusion Image Generation
- Backends:
- Local API (e.g., SD WebUI): Sends prompts to SD_URL.
- Pollinations: Cloud API with requests (faster but less control).
- Process:
-
- Uses prompt file + global style desc.
-
- Saves image as images/image{i}.jpg.
-
MoviePy Workflow (per fragment):
graph LR
subgraph VideoClipProcess{i}
Image --> ImageClip
Audio --> AudioClip
subgraph Compositing
ImageClip --> [Background]
TextClip --> [Foreground]
end
Compositing --> VideoClip
end
- Audio Processing:
- Crossfades
- Silence padding
- Text Overlay:
- Captions on image/movie clips.
- Output: videos/video{i}.mp4.
Steps:
- Clip Sorter: Orders video*.mp4 numerically.
- Transition Layer:
- Crossfades/soft cuts between clips.
- Background music layering.
- Encoder:
- H264 via moviepy.write_videofile.
graph TD
A[story.txt] --> B[Preprocessing]
B --> |Sentences| C{Fragment Split}
C --> |Frag#1| D[TTS → Audio]
C --> |Frag#1| E[LLM → Prompt]
E --> H1["image_prompt{i}.txt"]
H1 --> F[Stable Diffusion → Image]
F --> I1["image{i}.jpg"]
D --> G1["voiceover{i}.wav"]
G1 & I1 --> J[MoviePy Clip]
J --> K["video{i}.mp4"]
subgraph Aggregation
K --> L[Final.mp4]
style Aggregation fill:#f9f
end
L --> M[User Watch]
style D fill:#f88,stroke:#cc0
style E fill:#d8d
style F fill:#a93
- Processing Mode:
- Fragment jobs run in parallel (via multiprocessing).
- IO-bound tasks (TTS, API calls) use async/threads.
- Resource Limits:
- Checks CPU, memory, and swap (uses psutil).
graph LR
StartUserInput[Place story.txt] --> StartScript[python teller.py]
StartScript --> LoadProject[Project folder setup]
LoadProject --> ProcessText[Split and fragment]
ProcessText --> TTSPipeline[TTS Processing]
ProcessText --> PromptGen[LLM Prompts]
TTSPipeline --> AudioFiles
PromptGen --> Prompts
Prompts --> ImageGen[Images via SD]
subgraph PerFragmentSteps["Per-fragment steps"]
AudioFiles --> ClipAssembly[Audio+Image→Video]
ImageGen --> ClipAssembly
ClipAssembly --> VideoFragments
end
This architecture balances parallelism while preventing system overload, leveraging modern APIs and affordable cloud services where needed.
| Parameter | Type | Default | Description |
|---|---|---|---|
FRAGMENT_LENGTH |
Int | 150 | Words per text fragment (lower = shorter clips, higher = longer) |
FREE_SWAP |
Int | 200 | Minimum GB free space before processing |
DEBUG |
Bool | no | Verbose logging and debug info |
USE_CHATGPT |
Bool | yes | Enable ChatGPT-based prompt generation |
USE_ELEVENLABS |
Bool | no | Enable premium ElevenLabs TTS |
SD_URL |
URL | localhost:7860 | StableDiffusion API endpoint |
OLLAMA_MODEL |
String | llama2:7b | Ollama model specification |
STYLE_DESCRIPTION |
String | empty | Global art style to apply to all images |
# config.ini snippet
[GENERAL]
FREE_SWAP=200 # GB free RAM for swapping
DEBUG=no
[AUDIO]
USE_ELEVENLABS=no # Or edge-tts
[IMAGE_PROMPT]
OLLAMA_MODEL=llama3.2:3b-instruct-q8_0 # Offline model path
[STABLE_DIFFUSION]
SD_URL=http://localhost:7860 # Local API URL
- Python 3.8.10
- NVidia GPU with 4GB VRAM.
- Create new virtual env:
py -3.8 -m venv env- Activate your virtual env:
env/Scripts/activate- Install PyTorch from https://pytorch.org/get-started/locally/:
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116- Install packages from included requirements.txt:
pip install -r .\requirements.txt- Install ImageMagick:
https://imagemagick.org/script/download.php
Add both check boxes:
* Associate supported file extensions
* Install legacy utilities- Add your OpenAI Token from https://beta.openai.com/account/api-keys to environment variables:
setx OPENAI_TOKEN=your_token6a. Don't want to use OpenAI account? No problem! Make sure that USE_CHATGPT in config.ini is set to no:
USE_CHATGPT = no- Login to HugginFace using your Access Token from https://huggingface.co/settings/tokens:
huggingface-cli loginprojects/
├── my_first_story/
│ └── story.txt (Your book chapter goes here)
├── another_story/
│ └── story.txt
└── third_story/
└── story.txt
Each story.txt file can be as long an you want (longer texts take proportionally longer to process).
python teller_of_tales.pyThe script automatically discovers all projects in the projects/ directory and processes them sequentially or in parallel based on your configuration.
Each project folder contains the complete processing pipeline output:
projects/my_first_story/
├── story.txt # Original input
├── audio/
│ ├── voiceover0.mp3
│ ├── voiceover1.mp3
│ └── ...
├── images/
│ ├── image0.jpg
│ ├── image1.jpg
│ └── ...
├── text/
│ └── image-prompts/
│ ├── image_prompt0.txt
│ ├── image_prompt1.txt
│ └── ...
├── videos/
│ ├── video0.mp4
│ ├── video1.mp4
│ └── ...
└── final.mp4 # 🎬 Your completed video!
