I get nothing from this PDF. No Text or Table #1124
robbanp
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment
-
It appears that the text in that PDF is rasterized (converted from actual information about the characters to an image version of them). You can test this by opening the PDF in standard PDF-viewing software and attempting to select or copy-paste the text. To resolve the problem, you can try running the PDF through optical character recognition (OCR) software, but be aware that |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm not sure why, if it's the settings or data. I tried with multiple pdfs from the same source.
tst.pdf
import pdfplumber
print(pdfplumber.version)
pdf = pdfplumber.open("../data/tst.pdf")
p0 = pdf.pages[1]
p0.extract_words()
returns []
...
im = p0.to_image(resolution=75)
im
returns
p0.extract_table()
returns
[['', '', '', '', '', '', '', ''],
['', '', '', '', '', '', '', ''],
[None, '', None, None, None, None, None, None],
['', '', '', '', '', '', '', ''],
[None, '', None, None, None, None, None, None],
['', '', None, None, None, None, None, None],
[None, '', None, None, None, None, None, None],
['', '', '', '', '', '', '', ''],
[None, '', None, None, None, None, None, None],
['', '', '', '', '', '', '', ''],
[None, '', None, None, None, None, None, None]]
Beta Was this translation helpful? Give feedback.
All reactions