Skip to content

Latest commit

 

History

History
32 lines (20 loc) · 1.25 KB

README.md

File metadata and controls

32 lines (20 loc) · 1.25 KB

PDF to Text

Open in Streamlit visitor badge forks badge starts badge

PDF text data extraction app that takes a PDF document as input and returns either a txt file that contains all pages or a compressed folder of txt files representing the document pages. OCR can also be enabled for scanned docoments.

pdf_text_image

How does it worK?

flowchart LR

A[PDF] --> |text conversion / OCR| B(Text)
B --> |Option 1| D[txt file]
B --> |Option 2| E[ZIP folder of txt files for pages]

Loading
  1. Upload your PDF.
  2. Enable OCR (for scanned documents).
  3. Select the PDF language.
  4. Download your output file (zip/txt).

How to support the project

You can help support the project through feedback and/or buy me coffee.