Releases: gptscript-ai/gptparse
v0.3.0
GPTParse v0.3.0
New Features
- Added OCR mode for direct text extraction from PDFs and images
- Supports PDF, PNG, JPG/JPEG files
- Fast local processing without requiring AI services
- Optional abort-on-error flag for better error handling
- Enhanced CLI interface with four distinct processing modes:
- Vision mode (AI-powered)
- Fast mode (local processing)
- Hybrid mode (combined approach)
- OCR mode (direct text extraction)
Improvements
- Added support for processing image files (PNG, JPG/JPEG) in OCR mode
- Enhanced error handling and reporting
- Improved documentation with comprehensive examples for all modes
Technical Details
- Introduced new DoclingHandler for OCR processing
- Updated CLI interface to support OCR commands and options
- Added abort-on-error functionality for OCR processing
Usage
# New OCR mode examples
gptparse ocr document.pdf --output_file output.md
gptparse ocr scan.png --output_file output.md
gptparse ocr document.pdf --output_file output.md --abort-on-errorFor full documentation and examples, please see the README.md.
v0.2.0
GPTParse v0.2.0 - Multi-Mode Processing & Extended Format Support
GPTParse now supports both PDF documents and images (PNG, JPG, JPEG), offering multiple processing modes for conversion to Markdown. Choose between local processing, AI vision models, or a combination of both for optimal results.
Major New Features:
Extended Format Support
- PDF Documents: Full support for single and multi-page PDFs
- Image Files: Direct processing of PNG, JPG, and JPEG files
- Preserved Structure: Maintains tables, lists, and embedded images across all formats
Multiple Processing Modes
- Fast Mode: Local PDF processing using pymupdf4llm - no AI required
- Vision Mode: High-fidelity conversion of PDFs and images using Vision Language Models (VLMs)
- Hybrid Mode: Enhanced accuracy by combining fast and vision processing for PDFs
Improved Model Support
- Latest Claude 3.5 Sonnet model integration
- Pinned to latest stable releases for consistent performance
- Maintained support for OpenAI, Anthropic, and Google models
Quick Start:
# Install the latest version
pip install gptparse
# Process PDFs:
# Fast local processing (no AI required)
gptparse fast document.pdf --output_file output.md
# AI-powered vision processing
export OPENAI_API_KEY="your-openai-api-key"
gptparse vision document.pdf --output_file output.md
# Enhanced hybrid processing
gptparse hybrid document.pdf --output_file output.md
# Process Images:
gptparse vision screenshot.png --output_file output.md
gptparse vision photo.jpg --output_file output.mdSupported Models:
- OpenAI: gpt-4o (default), gpt-4o-mini
- Anthropic: claude-3-5-sonnet-latest (default), claude-3-opus-latest
- Google: gemini-1.5-pro-002 (default), gemini-1.5-flash-002, gemini-1.5-flash-8b
Breaking Changes:
- Anthropic models now use the
latesttag instead of specific dates - Minimum Python version requirement remains at 3.9
For full documentation, usage examples, and contribution guidelines, please refer to the README.md.
This release significantly expands GPTParse's capabilities with support for multiple file formats and processing modes, making it even more versatile for your document processing needs.
v0.1.2
v0.1.1
v0.1.0
GPTParse v0.1.0 - Initial Release
GPTParse is a powerful document parser designed for Retrieval-Augmented Generation (RAG) systems, enabling seamless conversion of PDF documents to Markdown format using advanced vision language models (VLMs).
Key Features:
- Convert complex PDFs to well-structured Markdown, preserving tables, lists, and images
- Support for multiple AI providers: OpenAI, Anthropic, and Google
- Flexible usage as both a Python library and CLI application
- Customizable processing options, including page selection and system prompts
- Detailed statistics for token usage and processing times
Quick Start:
pip install gptparse
export OPENAI_API_KEY="your-openai-api-key"
gptparse vision example.pdf --output_file output.mdSupported Models:
- OpenAI: gpt-4o (default), gpt-4o-mini
- Anthropic: claude-3-5-sonnet-20240620 (default), claude-3-opus-20240229, claude-3-sonnet-20240229, claude-3-haiku-20240307
- Google: gemini-1.5-pro-002 (default), gemini-1.5-flash-002, gemini-1.5-flash-8b
For full documentation, usage examples, and contribution guidelines, please refer to the README.md.
We're excited to introduce GPTParse and look forward to your feedback and contributions!