Help with table extraction settings when not all columns in table have a vertical line #1134

sfc-gh-svadakath · 2024-04-30T22:25:04Z

sfc-gh-svadakath
Apr 30, 2024

Hi, I'd like to table extract five columns from Schedule A (pages 5-36) in attached PDF. The challenges is that it only picks the middle three columns due to the left and right most columns not having enclosing vertical line. See screenshot. Is there a way using table settings to extract all five columns for just Schedule A table using pdfplumber? Appreciate any pointers. Thank you

Here's my code snippet looking at just page 5.

`import pdfplumber
import pandas as pd

Set display options to show all rows and columns

pd.set_option('display.max_rows', None) # None means unlimited
pd.set_option('display.max_columns', None) # None means unlimited

Specify the path to your PDF file

pdf_file_path = "/Users/svadakath/Data/AlertPDF/RBC.pdf"

pdf = pdfplumber.open(pdf_file_path)
table=pdf.pages[5].extract_table()
pd.DataFrame(table[1::],columns=table[1])

table_settings = {
"vertical_strategy": "lines",
"horizontal_strategy": "lines"
}
pdf = pdfplumber.open(pdf_file_path)
table=pdf.pages[5].extract_table(table_settings)
pd.DataFrame(table[1::],columns=table[0])`

RBC.pdf

jsvine · 2024-05-15T17:55:44Z

jsvine
May 15, 2024
Maintainer

Hi @sfc-gh-svadakath, and thanks for your interest in pdfplumber. The solution proposed in this comment may work for your situation, too: #617 (comment)

Let us know if not.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help with table extraction settings when not all columns in table have a vertical line #1134

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Help with table extraction settings when not all columns in table have a vertical line #1134

sfc-gh-svadakath Apr 30, 2024

Set display options to show all rows and columns

Specify the path to your PDF file

Replies: 1 comment

jsvine May 15, 2024 Maintainer

sfc-gh-svadakath
Apr 30, 2024

jsvine
May 15, 2024
Maintainer