Need to identify "correct" carriage returns #1130
enrac5
started this conversation in
Ask for help with specific PDFs
Replies: 2 comments
-
Hi @jsvine any thoughts on this one? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Passing table = page.extract_table({ "text_layout": True })
print(table[0][2]) Returns:
For more on how to adjust the layout parameters, see the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
simple_table.pdf
I have a table that has some text in one of the columns (see attached). The text is split up into paragraphs (usually representing dialogue). I need to be able to correctly identify which sections of text are paragraphs. Right now, using the
extract_tables(...)
method, the text in the third column has line breaks for each line, which makes sense, but makes paragraph detection difficult. Any ideas on how I can correctly identify the separate blocks of text?Beta Was this translation helpful? Give feedback.
All reactions