How to adjust line thickness (aka parser seeing empty cells)? #1092
enrac5
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment 1 reply
-
Hi @enrac5, and thanks for your interest in this library. Semi-regular tables like this can be a bit difficult to work with at first. The key to understanding it is that there is an entry in each row array for each cell horizontal "position" in the table. In this case, the right border of the It's not ideal, and something I'd like to make more intuitive / better-structured in future versions of |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm trying to parse a PDF (see attached)
Test_R1.pdf
def process_PDF(pdf_path: str): with pdfplumber.open(ccsl_pdf_path) as pdf: for page in pdf.pages: tables = page.extract_tables() for table in tables: for row in table: print(row)
I get a printout like this:
['COMBINED CONTINUITY & DIALOGUE', None, None, None, None, 'MASTER ENGLISH SUBTITLE/SPOTTING LIST', None, None, None, None]
['Shot', 'Footage/\nTimecode', None, 'Shot Description/Dialogue', None, 'Title', 'Start', 'End', 'Dur.', 'Subtitle/Spotting']
['', None, None, None, None, None, None, None, None, None]
Notice the second row, there's a 'None' entry after Timecode. The table debugger shows this image (attached).
Which doesn't immediately explain to me why I'm seeing that output, does anyone have any ideas what I'm doing wrong here? For what it's worth, other PDFs (all word docs printed to PDF) don't seem to behave like this.
Beta Was this translation helpful? Give feedback.
All reactions