Extract both texts and tables and images on the same page #1105

washyouself · 2024-03-06T08:24:50Z

washyouself
Mar 6, 2024

I am a beginner who uses pdfplumber, I hope to extract text and tables and images on the same page in natural order, what should I do? Is there any code example for reference? All the APIS I've seen are extracting some sort of element, all the text, all the forms, all the images.

I want to be able to get a page object and parse the pdf in a natural order.

jsvine · 2024-03-11T15:06:25Z

jsvine
Mar 11, 2024
Maintainer

The tricky bit here is "natural order" — an order may seem natural to a human, but isn't necessarily encoded in the PDF. "Natural order" logic can probably be custom coded for specific types of PDFs, but that particular logic will depend on how those PDFs are arranged.

There's also a question of "extract to what?", since tables have a different sort of representation than plain text and than images.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract both texts and tables and images on the same page #1105

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Extract both texts and tables and images on the same page #1105

washyouself Mar 6, 2024

Replies: 1 comment

jsvine Mar 11, 2024 Maintainer

washyouself
Mar 6, 2024

jsvine
Mar 11, 2024
Maintainer