Fabula: AI-Powered Narrative Analysis Engine

Fabula is an AI-powered narrative analysis engine that transforms unstructured narrative texts (scripts, novels, etc.) into richly structured knowledge graphs. By combining LLM-driven extraction with a robust entity resolution pipeline, Fabula enables deep analysis of story structure, character development, and thematic elements.

Inspired by the BBC's Mythology Engine, Fabula aims to unlock narrative information and make it explorable through graph-based queries.

Core Features

Script Pre-processing: Converts TV/film scripts from various formats into standardized JSON using script2json.py
LLM-Based Entity Extraction: Uses BAML for structured extraction of:
- Characters (Agents)
- Locations
- Objects
- Organizations
- Events
- Relationships
Two-Pass Processing Pipeline:
- First Pass: Raw entity extraction
- Second Pass: Detailed scene metadata, events, and participations
Entity Resolution: Reconciles and merges duplicate entities using fuzzy matching and LLM-assisted resolution
Cypher Generation: Converts processed data to Neo4j Cypher queries for graph database import
Natural Language Graph Queries: Work-in-progress tool for converting natural language to Cypher queries

Quick Start

Prerequisites

# Required Python packages (requirements.txt coming soon)
pip install requests beautifulsoup4 thefuzz neo4j pydantic openai

You'll also need:

Neo4j Desktop (free version available)
Access to an LLM API (by default the code uses OpenAI o3-mini)

Basic Usage

Convert a script to JSON:

python script2json.py "http://chakoteya.net/DoctorWho/29-10.html" output.json

Process the script and generate the knowledge graph:

# Basic usage with default settings
python main.py input_file.json --output output_graph.json

# Using combined extraction mode (recommended for speed and efficiency)
python main.py input_file.json --combined --output output_graph.json

# Using fully combined extraction mode for max performance
python main.py input_file.json --fully-combined --output output_graph.json

# Using batch resolution for large datasets
python main.py input_file.json --batch-resolution --output output_graph.json

Convert the processed data to Cypher:

from json_cypher import main as generate_cypher
generate_cypher()

Processing Modes

Fabula offers several processing modes to balance speed, cost, and accuracy:

Standard Mode

Standard mode processes each entity type separately and is good for small datasets or when you need maximum precision.

python main.py input_file.json

Combined Mode (`--combined`)

Combined mode extracts all primary entities (agents, locations, organizations, objects) in a single pass, reducing API calls and improving consistency.

python main.py input_file.json --combined

Fully Combined Mode (`--fully-combined`) RECOMMENDED

This mode extracts both primary and secondary entities (events, participations, etc.) in combined calls, offering the best performance. This is recommended for most use cases.

python main.py input_file.json --fully-combined

Batch Resolution (`--batch-resolution`) RECOMMENDED

For large datasets, batch resolution processes entity resolution in smaller groups, improving performance and reducing API costs.

python main.py input_file.json --batch-resolution

Multiple Flags

You can combine multiple flags for optimal performance:

python main.py input_file.json --fully-combined --batch-resolution

Project Structure

fabula/
├── main.py                 # Main orchestration
├── episode_processor.py    # Episode-level processing
├── scene_processor.py      # Scene-level extraction
├── entity_registry.py      # Entity management/resolution
├── validation.py          # Reference validation
├── context.py            # Global context management
├── utils/
│   ├── script2json.py    # Script preprocessing
│   ├── fabula_graphrag.py # Natural language query tool
│   └── json_cypher.py  # Generate graph as Cypher statements
└── baml_src/
    └── myth06.baml       # BAML extraction definitions

Key Components

Script Pre-processing (`utils/script2json.py`)

The script converter is designed to work with TV/film scripts from sources like chakoteya.net. It:

Parses HTML/text scripts into structured JSON
Extracts scene boundaries, dialogue, and stage directions
Handles multi-episode stories
Supports various script formats

Example output structure:

{
  "Story": "Blink",
  "Airdate": "2007-06-09",
  "Episodes": [
    {
      "Episode": "Episode One",
      "Scenes": [
        {
          "Scene": "WESTER DRUMLINS",
          "Dialogue": [
            {
              "Character": "SALLY",
              "Line": "Hello? Is someone there?"
            },
            {
              "Stage Direction": "Sally enters the abandoned house"
            }
          ]
        }
      ]
    }
  ]
}

Cypher Generation (`utils/json_cypher.py`)

Converts processed story data into Neo4j Cypher queries for graph database import. Features:

Generates schema cleanup and constraint creation
Creates nodes for all entity types
Establishes relationships between entities
Handles metadata and properties
Supports incremental updates

Natural Language Queries ('utils/fabula_graphrag.py')

The fabula_graphrag.py tool enables natural language querying of your narrative knowledge graphs, allowing exploration without writing Cypher queries:

Features

Dynamic schema extraction from Neo4j database
LLM-based question decomposition for complex narrative inquiries
BAML-constrained Cypher generation for accurate queries
Multi-hop path exploration for deep narrative connections
Rich answer synthesis with narrative context

Usage

# Basic usage
python fabula_graphrag.py

# With custom Neo4j connection
python fabula_graphrag.py --uri bolt://localhost:7687 --username neo4j --password password

Example questions:

"What happens prior to the Doctor's arrival?"
"How does Josh's emotional state change throughout the episode?"
"Which characters visit the White House Situation Room?"
"What objects are significant to the main character's development?"

Implementation Details

The system:

Extracts and enriches the database schema with narrative semantics
Decomposes complex questions into focused sub-questions
Generates Cypher queries constrained by BAML templates
Traverses multi-hop relationships to find narrative connections
Synthesizes results into coherent narrative answers

The architecture combines BAML's structured output guarantees with rich narrative exploration capabilities, allowing both precise fact retrieval and deeper thematic analysis.

Development Status

Implemented

Core extraction pipeline
Entity resolution with batch processing
Combined extraction modes
Basic validation
Script preprocessing
Cypher generation

In Progress

Natural language query tool
Enhanced entity resolution
Requirements specification
Documentation improvements
Test coverage

Planned

Multi-modal support (extract from video)
Additional script format support
Interactive visualization
API documentation
Performance optimizations

Contributing

While in active development, we welcome:

Bug reports
Feature suggestions
Documentation improvements
Script format contributions
Ontology enhancements

Please open an issue to discuss potential changes.

License

[License TBD]

Acknowledgements

BBC Mythology Engine for inspiration
chakoteya.net for script resources
BAML team for the extraction framework
Neo4j community for graph database expertise

For more information or to report issues, please open a GitHub issue.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
baml_client		baml_client
baml_src		baml_src
media		media
output		output
source_docs		source_docs
utils		utils
.gitignore		.gitignore
README.md		README.md
concatenate.py		concatenate.py
context.py		context.py
entity_registry.py		entity_registry.py
episode_processor.py		episode_processor.py
main.py		main.py
post_processor.py		post_processor.py
scene_processor.py		scene_processor.py
test_batch_resolution.py		test_batch_resolution.py
test_combined_extraction.py		test_combined_extraction.py
test_event_interactions.py		test_event_interactions.py
utils.py		utils.py
validation.py		validation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fabula: AI-Powered Narrative Analysis Engine

Core Features

Quick Start

Prerequisites

Basic Usage

Processing Modes

Standard Mode

Combined Mode (`--combined`)

Fully Combined Mode (`--fully-combined`) RECOMMENDED

Batch Resolution (`--batch-resolution`) RECOMMENDED

Multiple Flags

Project Structure

Key Components

Script Pre-processing (`utils/script2json.py`)

Cypher Generation (`utils/json_cypher.py`)

Natural Language Queries ('utils/fabula_graphrag.py')

Features

Usage

Implementation Details

Development Status

Implemented

In Progress

Planned

Contributing

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

brandburner/fabula

Folders and files

Latest commit

History

Repository files navigation

Fabula: AI-Powered Narrative Analysis Engine

Core Features

Quick Start

Prerequisites

Basic Usage

Processing Modes

Standard Mode

Combined Mode (--combined)

Fully Combined Mode (--fully-combined) RECOMMENDED

Batch Resolution (--batch-resolution) RECOMMENDED

Multiple Flags

Project Structure

Key Components

Script Pre-processing (utils/script2json.py)

Cypher Generation (utils/json_cypher.py)

Natural Language Queries ('utils/fabula_graphrag.py')

Features

Usage

Implementation Details

Development Status

Implemented

In Progress

Planned

Contributing

License

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Combined Mode (`--combined`)

Fully Combined Mode (`--fully-combined`) RECOMMENDED

Batch Resolution (`--batch-resolution`) RECOMMENDED

Script Pre-processing (`utils/script2json.py`)

Cypher Generation (`utils/json_cypher.py`)

Packages