-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extracting table with vertical texts give unreadable result #942
Comments
You can try modifying the default text extraction options e.g. page.extract_table(dict(text_vertical_ttb=False)) [['Sl.\nNo.',
'District',
'Population\n2012-13\nlakhs)\nProjected\n(In\nfor',
'88%\nto lakhs)\nAdult\nEquivalent\n(In',
'400gms/adult/day)\nConsumption\ntonnes)\nrequirement\nLakh\nTotal\n(In\n(@',
'Requirement seeds, wastage) tonnes)\n(Including\nLakh\n&\nfeeds\nTotal\n(In',
'Production (Rice)\n(In Lakh tonnes)',
None,
None,
'Surplus/Defi cit\n(In Lakh\ntonnes)',
None],
[None,
None,
None,
None,
None,
None,
'Kharif',
'Rabi',
'Total',
'Rice',
'Paddy']]
... |
Thank you for your suggestion. It worked! |
They are mentioned in the description of the
With regards to plans, I'm just a fellow pdfplumber user. That would probably be a question for @jsvine |
Thanks for your help here, @cmdlineluser! @Dragon2fly, it's helpful to hear your confusion. To know about
I don't plan on making any major changes to this parameter or its availability. Does that answer your question, or have I misunderstood it? |
Hi @jsvine,
From a user experience perspective, the fewer parameters that need to be configured the better. So I just wonder if there is a way to detect the text orientation and just extract it correctly. Anyway, even though the The correct one should be Any suggestion? |
@Dragon2fly Thank you for clarifying. At the moment, adding automatic text-direction detection isn't on my roadmap, due to the likely large number of edge-cases, and my preference to keep extraction "predictable" and parameters explicit. But I appreciate the suggestion and will keep your use-case in mind. Re. lines merging: Try decreasing the |
Hi @jsvine. Thanks for your suggestion. |
Thank you @Dragon2fly. Looking into this, there may be a bug in how |
this rhymes with #942 going to work on it |
Describe the bug
Table extraction with vertical header texts returned unreadable string or reverted order.
Have you tried repairing the PDF?
Yes. The problem is still there
Code to reproduce the problem
PDF file
agstat.pdf
Expected behavior
The vertical text in the red box should be extracted correctly.
Actual behavior
It returned unreadable text for the first row:
And returned reversed text of the second row
Screenshots
The table outline is still detected correctly
Environment
The text was updated successfully, but these errors were encountered: