Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hocr-pdf printing Hebrew text in opposite direction in the generated pdf file #163

Open
smijo149 opened this issue Feb 2, 2021 · 4 comments

Comments

@smijo149
Copy link
Contributor

smijo149 commented Feb 2, 2021

The pdf file generated using hocr-pdf has Hebrew text printed in the opposite direction.

Steps I followed:

  1. I used Google cloud vision to get the OCR
  2. Used gcv2hocr to generate hocr.
  3. Used hocr-pdf --savefile output.pdf actual-file.jpg to generate pdf file.

The pdf file has Hebrew text inserted in it but in the reverse order.

Actual image:

Screen Shot 2021-02-01 at 6 48 35 PM

This is how my hocr file looks:

Screen Shot 2021-02-01 at 7 01 04 PM

Text in pdf file: (I have set text visibility mode to 0 so that the inserted text is visible)

Screen Shot 2021-02-01 at 6 48 56 PM

Hebrew is a right to left language so not sure if I have to pass any language or direction parameters to get this right.

@stweil
Copy link
Collaborator

stweil commented Feb 2, 2021

I am afraid that hocr-pdf was never tested with RTL text. Using bidi like in https://github.com/tesseract-ocr/tesstrain/blob/master/generate_wordstr_box.py might fix that.

@smijo149
Copy link
Contributor Author

smijo149 commented Feb 3, 2021

Thanks! I will try it out and see if that works for me.

@joewiz
Copy link

joewiz commented Apr 5, 2021

@smijo149 Looks like you solved this. I wonder if the maintainers of hocr-tools would be interested in your PR?

@smijo149
Copy link
Contributor Author

smijo149 commented Apr 6, 2021

@joewiz Yeah I was able to solve the issue based on @stweil suggestion. I have opened a PR #165 if anyone is interested. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants