Skip to content

Releases: gptscript-ai/gptparse

v0.3.0

12 Nov 01:50

Choose a tag to compare

GPTParse v0.3.0

New Features

  • Added OCR mode for direct text extraction from PDFs and images
    • Supports PDF, PNG, JPG/JPEG files
    • Fast local processing without requiring AI services
    • Optional abort-on-error flag for better error handling
  • Enhanced CLI interface with four distinct processing modes:
    • Vision mode (AI-powered)
    • Fast mode (local processing)
    • Hybrid mode (combined approach)
    • OCR mode (direct text extraction)

Improvements

  • Added support for processing image files (PNG, JPG/JPEG) in OCR mode
  • Enhanced error handling and reporting
  • Improved documentation with comprehensive examples for all modes

Technical Details

  • Introduced new DoclingHandler for OCR processing
  • Updated CLI interface to support OCR commands and options
  • Added abort-on-error functionality for OCR processing

Usage

# New OCR mode examples
gptparse ocr document.pdf --output_file output.md
gptparse ocr scan.png --output_file output.md
gptparse ocr document.pdf --output_file output.md --abort-on-error

For full documentation and examples, please see the README.md.

v0.2.0

04 Nov 22:42

Choose a tag to compare

GPTParse v0.2.0 - Multi-Mode Processing & Extended Format Support

GPTParse now supports both PDF documents and images (PNG, JPG, JPEG), offering multiple processing modes for conversion to Markdown. Choose between local processing, AI vision models, or a combination of both for optimal results.

Major New Features:

Extended Format Support

  • PDF Documents: Full support for single and multi-page PDFs
  • Image Files: Direct processing of PNG, JPG, and JPEG files
  • Preserved Structure: Maintains tables, lists, and embedded images across all formats

Multiple Processing Modes

  • Fast Mode: Local PDF processing using pymupdf4llm - no AI required
  • Vision Mode: High-fidelity conversion of PDFs and images using Vision Language Models (VLMs)
  • Hybrid Mode: Enhanced accuracy by combining fast and vision processing for PDFs

Improved Model Support

  • Latest Claude 3.5 Sonnet model integration
  • Pinned to latest stable releases for consistent performance
  • Maintained support for OpenAI, Anthropic, and Google models

Quick Start:

# Install the latest version
pip install gptparse

# Process PDFs:
# Fast local processing (no AI required)
gptparse fast document.pdf --output_file output.md

# AI-powered vision processing
export OPENAI_API_KEY="your-openai-api-key"
gptparse vision document.pdf --output_file output.md

# Enhanced hybrid processing
gptparse hybrid document.pdf --output_file output.md

# Process Images:
gptparse vision screenshot.png --output_file output.md
gptparse vision photo.jpg --output_file output.md

Supported Models:

  • OpenAI: gpt-4o (default), gpt-4o-mini
  • Anthropic: claude-3-5-sonnet-latest (default), claude-3-opus-latest
  • Google: gemini-1.5-pro-002 (default), gemini-1.5-flash-002, gemini-1.5-flash-8b

Breaking Changes:

  • Anthropic models now use the latest tag instead of specific dates
  • Minimum Python version requirement remains at 3.9

For full documentation, usage examples, and contribution guidelines, please refer to the README.md.

This release significantly expands GPTParse's capabilities with support for multiple file formats and processing modes, making it even more versatile for your document processing needs.

v0.1.2

23 Oct 07:44

Choose a tag to compare

This release allows users to use the latest claude 3.5 model. Moving forward it will be pinned to the latest 3.5 sonnet model.

v0.1.1

19 Oct 02:41

Choose a tag to compare

Several improvements to user experience post-installation.

v0.1.0

18 Oct 22:24

Choose a tag to compare

GPTParse v0.1.0 - Initial Release

GPTParse is a powerful document parser designed for Retrieval-Augmented Generation (RAG) systems, enabling seamless conversion of PDF documents to Markdown format using advanced vision language models (VLMs).

Key Features:

  • Convert complex PDFs to well-structured Markdown, preserving tables, lists, and images
  • Support for multiple AI providers: OpenAI, Anthropic, and Google
  • Flexible usage as both a Python library and CLI application
  • Customizable processing options, including page selection and system prompts
  • Detailed statistics for token usage and processing times

Quick Start:

pip install gptparse
export OPENAI_API_KEY="your-openai-api-key"
gptparse vision example.pdf --output_file output.md

Supported Models:

  • OpenAI: gpt-4o (default), gpt-4o-mini
  • Anthropic: claude-3-5-sonnet-20240620 (default), claude-3-opus-20240229, claude-3-sonnet-20240229, claude-3-haiku-20240307
  • Google: gemini-1.5-pro-002 (default), gemini-1.5-flash-002, gemini-1.5-flash-8b

For full documentation, usage examples, and contribution guidelines, please refer to the README.md.

We're excited to introduce GPTParse and look forward to your feedback and contributions!