⚡ Tablers

A blazingly fast PDF table extraction library with python API powered by Rust

Features

🚀 Blazingly Fast - Core algorithms written in Rust for maximum performance
🐍 Pythonic API - Easy-to-use Python interface with full type hints
📄 Edge Detection - Accurate table detection using line and rectangle edge analysis
📝 Text Extraction - Extract text content from table cells with configurable settings
📤 Multiple Export Formats - Export tables to CSV, Markdown, and HTML
🔐 Encrypted PDFs - Support for password-protected PDF documents
💾 Memory Efficient - Lazy page loading for handling large PDF files
🖥️ Cross-Platform - Works on Windows, Linux, and macOS

Why Tablers?

This project draws significant inspiration from the table extraction modules of pdfplumber and PyMuPDF. Compared to pdfplumber and PyMuPDF, tablers has the following advantages:

High Performance: Utilizes Rust for high-performance PDF processing
More Configurable: Supports customizable table filter settings (min_rows, min_columns, include_single_cell, e.g., see this issue)
Clean Python Dependencies: No external python dependencies required

Benchmark

Performance comparison of tablers, pymupdf and pdfplumber for PDF table extraction:

For more details, please refer to the tablers-benchmark repository.

Note

This solution is primarily designed for text-based PDFs and does not support scanned PDFs.

Installation

pip install tablers

Quick Start

Basic Table Extraction

from tablers import Document, find_tables

# Open a PDF document
doc = Document("example.pdf")

# Extract tables from each page
for page in doc.pages():
    tables = find_tables(page, extract_text=True)
    for table in tables:
        print(f"Found table with {len(table.cells)} cells")
        for cell in table.cells:
            print(f"  Cell: {cell.text} at {cell.bbox}")

doc.close()

Using Context Manager

from tablers import Document, find_tables

with Document("example.pdf") as doc:
    page = doc.get_page(0)  # Get first page
    tables = find_tables(page, extract_text=True)

    for table in tables:
        print(f"Table bbox: {table.bbox}")

For more advanced usage, please refer to the documents.

Requirements

Python >= 3.10
Supported platforms: Windows (x64), Linux (x64) with glibc >= 2.34, macOS (ARM64)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

pdfium-render - Rust bindings for PDFium
PyO3 - Rust bindings for Python

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.github		.github
docs		docs
python/tablers		python/tablers
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pdm.toml		pdm.toml
pylock.toml		pylock.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ Tablers

Features

Why Tablers?

Benchmark

Note

Installation

Quick Start

Basic Table Extraction

Using Context Manager

Requirements

License

Acknowledgments

About

Uh oh!

Releases 5

Languages

License

monchin/tablers

Folders and files

Latest commit

History

Repository files navigation

⚡ Tablers

Features

Why Tablers?

Benchmark

Note

Installation

Quick Start

Basic Table Extraction

Using Context Manager

Requirements

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Languages