Extract tables with pre-defined header #1122
Unanswered
QuentinAndre11
asked this question in
Q&A
Replies: 1 comment
-
Thanks for the kind words about |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi !
First of all, thank you for this amazing library !
For my project, I want to extract several tables with a pre-defined layout: I don't know the lines that I must use, but for each table, I know the header (name of the columns and sub-columns) and a representative point which should be contained in the table.
My first approach would be to directly extract tables, then map every table to its representative point, then post-process the table to merge columns if needed (merge per sub-column, divide if some columns have been accidentally merged...).
Another approach would be to firstly find the column names in the layout, and then use their lines to constraint the table finder. However, it seems very expensive, even using the spatial index (r-tree) I built for the layout to restrict the research area for the text.
So my final approach would be to directly constraint the table finder with provided text (list of column names). Do you think it would be possible to do so?
Thank you very much,
QA
PS: I can try to anonymize my data if you think it is relevant.
Beta Was this translation helpful? Give feedback.
All reactions