Convert PowerPoint presentations into semantically rich text using Vision Language Models.
ppt2desc is a command-line tool that converts PowerPoint presentations into detailed textual descriptions. PowerPoint presentations are an inherently visual medium that often convey complex ideas through a combination of text, graphics, charts, and other visual layouts. This tool uses vision language models to not only transcribe the text content but also interpret and describe the visual elements and their relationships, capturing the full semantic meaning of each slide in a machine-readable format.
- Convert PPT/PPTX files to semantic descriptions
- Process individual files or entire directories
- Support for visual elements interpretation (charts, graphs, figures)
- Rate limiting for API calls
- Customizable prompts and instructions
- JSON output format for easy integration
Current Model Provider Support
- Gemini models via Google Gemini API
- GPT Models via OpenAI API
- Claude Models via Anthropic API
- Gemini Models via Google Cloud Platform Vertex AI
- GPT Models via Microsoft Azure AI Foundry Deployments
- Nova & Claude Models via Amazon Web Services's Amazon Bedrock
- Python 3.9 or higher
- LibreOffice (for PPT/PPTX to PDF conversion)
- Option 1: Install LibreOffice locally.
- Option 2: Use the provided Docker container for LibreOffice.
- vLLM API credentials
- Clone the repository:
git clone https://github.com/ALucek/ppt2desc.git
cd ppt2desc
- Installing LibreOffice
LibreOffice is a critical dependency for this tool as it handles the headless conversion of PowerPoint files to PDF format
Option 1: Local Installation
Linux Systems:
sudo apt install libreoffice
macOS:
brew install libreoffice
Windows:
Build from the installer at LibreOffice's Official Website
Option 2: Docker-based Installation
a. Ensure you have Docker installed on your system
b. Run the following command
docker compose up -d
This command will build the Docker image based on the provided Dockerfile and start the container in detached mode. The LibreOffice conversion service will be accessible athttp://localhost:2002
. Take down with docker compose down
.
- Create and activate a virtual environment:
python -m venv ppt2desc_venv
source ppt2desc_venv/bin/activate # On Windows: ppt2desc_venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
Basic usage with Gemini API:
python src/main.py \
--input_dir /path/to/presentations \
--output_dir /path/to/output \
--libreoffice_path /path/to/soffice \
--client gemini \
--api_key YOUR_GEMINI_API_KEY
General Arguments:
--input_dir
: Path to input directory or PPT file (required)--output_dir
: Output directory path (required)--client
: LLM client to use: 'gemini', 'vertexai', 'anthropic', 'azure', 'aws' or 'openai' (required)--model
: Model to use (default: "gemini-1.5-flash")--instructions
: Additional instructions for the model--libreoffice_path
: Path to LibreOffice installation--libreoffice_url
: Url for docker-based libreoffice installation (configured: http://localhost:2002)--rate_limit
: API calls per minute (default: 60)--prompt_path
: Custom prompt file path--api_key
: Model Provider API key (if not set via environment variable)--save_pdf
: Include to save the converted PDF in your output folder--save_images
: Include to save the individual slide images in your output folder
Vertex AI Specific Arguments:
--gcp_project_id
: GCP project ID for Vertex AI service account--gcp_region
: GCP region for Vertex AI service (e.g., us-central1)--gcp_application_credentials
: Path to GCP service account JSON credentials file
Azure AI Foundry Specific Arguments:
--azure_openai_api_key
: Azure AI Foundry Resource Key 1 or Key 2--azure_openai_endpoint
: Azure AI Foundry deployment service endpoint link--azure_deployment_name
: The name of your model deployment--azure_api_version
: Azure API Version (Default: "2023-12-01-preview")
AWS Amazon Bedrock Specific Arguments:
--aws_access_key_id
: Bedrock Account Access Key--aws_secret_access_key
: Bedrock Account Account Secret Access Key--aws_region
: AWS Bedrock Region
Using Gemini API:
python src/main.py \
--input_dir ./presentations \
--output_dir ./output \
--libreoffice_path ./soffice \
--client gemini \
--model gemini-1.5-flash \
--rate_limit 30 \
--instructions "Focus on extracting numerical data from charts and graphs"
Using Vertex AI:
python src/main.py \
--input_dir ./presentations \
--output_dir ./output \
--client vertexai \
--libreoffice_path ./soffice \
--gcp_project_id my-project-123 \
--gcp_region us-central1 \
--gcp_application_credentials ./service-account.json \
--model gemini-1.5-pro \
--instructions "Extract detailed information from technical diagrams"
Using Azure AI Foundry:
python src/main.py \
--input_dir ./presentations \
--output_dir ./output \
--libreoffice_path ./soffice \
--client azure \
--azure_openai_api_key 123456790ABCDEFG \
--azure_openai_endpoint 'https://example-endpoint-001.openai.azure.com/' \
--azure_deployment_name gpt-4o \
--azure_api_version 2023-12-01-preview \
--rate_limit 60
Using AWS Amazon Bedrock:
python src/main.py \
--input_dir ./presentations \
--output_dir ./output \
--libreoffice_path ./soffice \
--client aws \
--model us.amazon.nova-lite-v1:0 \
--aws_access_key_id 123456790ABCDEFG \
--aws_secret_access_key 123456790ABCDEFG \
--aws_region us-east-1 \
--rate_limit 60
The tool generates JSON files with the following structure:
{
"deck": "presentation.pptx",
"model": "model-name",
"slides": [
{
"number": 1,
"content": "Detailed description of slide content..."
},
// ... more slides
]
}
When using the Docker container for LibreOffice, you can use the --libreoffice_url
argument to direct the conversion process to the container's API endpoint, rather than a local installation.
python src/main.py \
--input_dir ./presentations \
--output_dir ./output \
--libreoffice_url http://localhost:2002 \
--client vertexai \
--model gemini-1.5-pro \
--gcp_project_id my-project-123 \
--gcp_region us-central1 \
--gcp_application_credentials ./service-account.json \
--rate_limit 30 \
--instructions "Extract detailed information from technical diagrams" \
--save_pdf \
--save_images
You should use either --libreoffice_url
or --libreoffice_path
but not both.
You can modify the base prompt by editing src/prompt.txt
or providing additional instructions via the command line:
python src/main.py \
--input_dir ./presentations \
--output_dir ./output \
--libreoffice_path ./soffice \
--instructions "Include mathematical equations and formulas in LaTeX format"
For Consumer APIs:
- Set your API key via the
--api_key
argument or through your respective provider's environment variables
For Vertex AI:
- Create a service account in your GCP project IAM
- Grant necessary permissions (typically, "Vertex AI User" role)
- Download the service account JSON key file
- Provide the credentials file path via
--gcp_application_credentials
For Azure OpenAI Foundry:
- Create an Azure OpenAI Resource
- Navigate to Azure AI Foundry and choose the subscription and Azure OpenAI Resource to work with
- Under management select deployments
- Select create new deployment and configure with your vision LLM
- Provide deployment name, API key, endpoint, and api version via
--azure_deployment_name
,--azure_openai_api_key
,--azure_openai_endpoint
,--azure_api_version
,
For AWS Bedrock:
- Request access to serverless model deployments in Amazon Bedrock's model catalog
- Create a user in your AWS IAM
- Enable Amazon Bedrock access policies for your user
- Save User Credentials access key and secret access key
- Provide user's credentials via
--aws_access_key_id
, and--aws_secret_access_key
Contributions are welcome! Please feel free to submit a Pull Request.
Todo
- Handling google's new genai SDK for a unified gemini/vertex experience
- Better Docker Setup
- AWS Llama Vision Support Confirmation
- Combination of JSON files across multiple ppts
- Dynamic font understanding (i.e. struggles when font that ppt is using is not installed on machine)
This project is licensed under the MIT License - see the LICENSE file for details.
- LibreOffice for PPT/PPTX conversion
- PyMuPDF for PDF processing