Extract both texts and tables and images on the same page #1105
Unanswered
washyouself
asked this question in
Q&A
Replies: 1 comment
-
The tricky bit here is "natural order" — an order may seem natural to a human, but isn't necessarily encoded in the PDF. "Natural order" logic can probably be custom coded for specific types of PDFs, but that particular logic will depend on how those PDFs are arranged. There's also a question of "extract to what?", since tables have a different sort of representation than plain text and than images. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am a beginner who uses pdfplumber, I hope to extract text and tables and images on the same page in natural order, what should I do? Is there any code example for reference? All the APIS I've seen are extracting some sort of element, all the text, all the forms, all the images.
I want to be able to get a page object and parse the pdf in a natural order.
Beta Was this translation helpful? Give feedback.
All reactions