Need help with complex pdf with tables across multiple pages #1209
learning-frog
started this conversation in
Ask for help with specific PDFs
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to extract table data from this PDF, which contains two types of tables that span multiple pages:
Tables with the same column names spread across multiple pages (pages 62-64).
Tables with different columns across multiple pages (pages 36-37).
I’m using the following logic to join tables that span multiple pages:
I maintain two counters: one for the previous table and one for the current table.
If the previous table is on the right, the current table is on top, and the headers don’t match, I classify it as the second type of table.
If the previous table is on the right, the current table is on top, and the headers match, I classify it as the first type of table.
However, I’m struggling to extract these tables using the correct table settings. I’ve experimented with different configurations but haven’t been able to extract them successfully.
Any advice or assistance would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions