-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gibberish in output #489
Comments
Hi @kevinburke nice to see you here :) This is almost certainly an issue in how pdfbox, the library Tabula uses to interact at a low-level with the PDF, handles PDFs generated in weird ways. The best fix is to re-encode the PDF with pdftk or Acrobat or a tool of your choice. That generally fixes things. |
It could also be a subsetted-font, which is essentially a non-standard encoding. See this StackOverflow answer. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm using Tabula for Mac. We are trying to export the tables in the attached PDF.
concord_housing_table.pdf
The initial upload generated a lot of overlapping selections. We removed all of them except for the selections that covered the entire table row.
When we go to export, the output looks like complete gibberish:
We're confused about this, because clearly it's meaningful gibberish - the number of gibberish characters corresponds to the text in the original file. Maybe we missed an encoding setting? We tried using the tools in the app but didn't see anything meaningful.
The text was updated successfully, but these errors were encountered: