Inconsistent Table Detection #1131
Replies: 2 comments 1 reply
-
Hi @dlasusa, and thanks for sharing this example. The bad news:
|
Beta Was this translation helpful? Give feedback.
-
Thanks @jsvine ! I ended up iterating the pdf twice (with 2 different sets of table settings). The pass through I tracked which pages had no results, and the second pass, iterated just those pages and combined the results. I wasn't sure if I was doing something wrong. My first pass table settings: table_settings = {
"explicit_vertical_lines": [17, 528],
"horizontal_strategy": "lines",
"snap_y_tolerance": 10,
"intersection_tolerance": 10,
"join_tolerance": 300
} Second pass: table_settings2 = {
"explicit_vertical_lines": [17, 528],
"explicit_horizontal_lines": [60, 190],
"horizontal_strategy": "lines",
"snap_tolerance": 10,
"intersection_tolerance": 10,
"join_tolerance": 300
} By adding some Thank you for your suggestion and your time! |
Beta Was this translation helpful? Give feedback.
-
I don't think I can share the pdf since it's got police data in it, but I'm sharing images of the issue (I've removed some info from the images, but I don't think it's relevant).
In the PDF, when a page has multiple entries, it seems to detect the tables as expected. But if there is a single item, it puts a box around the item, but I don't get the blue "table" color.
What I expect:
When there is just one item:
Environment:
Win11
Python 3.12.2
pdfplumber 0.11.0
Beta Was this translation helpful? Give feedback.
All reactions