I get nothing from this PDF. No Text or Table #1124

robbanp · 2024-04-13T13:02:10Z

robbanp
Apr 13, 2024

I'm not sure why, if it's the settings or data. I tried with multiple pdfs from the same source.
tst.pdf

import pdfplumber
print(pdfplumber.version)
pdf = pdfplumber.open("../data/tst.pdf")
p0 = pdf.pages[1]
p0.extract_words()

returns []

...

im = p0.to_image(resolution=75)
im

returns

p0.extract_table()

returns

[['', '', '', '', '', '', '', ''],
['', '', '', '', '', '', '', ''],
[None, '', None, None, None, None, None, None],
['', '', '', '', '', '', '', ''],
[None, '', None, None, None, None, None, None],
['', '', None, None, None, None, None, None],
[None, '', None, None, None, None, None, None],
['', '', '', '', '', '', '', ''],
[None, '', None, None, None, None, None, None],
['', '', '', '', '', '', '', ''],
[None, '', None, None, None, None, None, None]]

jsvine · 2024-04-14T22:11:37Z

jsvine
Apr 14, 2024
Maintainer

It appears that the text in that PDF is rasterized (converted from actual information about the characters to an image version of them). You can test this by opening the PDF in standard PDF-viewing software and attempting to select or copy-paste the text.

To resolve the problem, you can try running the PDF through optical character recognition (OCR) software, but be aware that pdfplumber does not work as well with OCR'ed PDFs.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I get nothing from this PDF. No Text or Table #1124

{{title}}

Replies: 1 comment

{{title}}

Select a reply

I get nothing from this PDF. No Text or Table #1124

robbanp Apr 13, 2024

Replies: 1 comment

jsvine Apr 14, 2024 Maintainer

robbanp
Apr 13, 2024

jsvine
Apr 14, 2024
Maintainer