hocr-pdf printing Hebrew text in opposite direction in the generated pdf file #163

smijo149 · 2021-02-02T01:02:33Z

The pdf file generated using hocr-pdf has Hebrew text printed in the opposite direction.

Steps I followed:

The pdf file has Hebrew text inserted in it but in the reverse order.

Hebrew is a right to left language so not sure if I have to pass any language or direction parameters to get this right.

The text was updated successfully, but these errors were encountered:

stweil · 2021-02-02T06:27:26Z

I am afraid that hocr-pdf was never tested with RTL text. Using bidi like in https://github.com/tesseract-ocr/tesstrain/blob/master/generate_wordstr_box.py might fix that.

smijo149 · 2021-02-03T00:21:51Z

Thanks! I will try it out and see if that works for me.

joewiz · 2021-04-05T05:06:59Z

@smijo149 Looks like you solved this. I wonder if the maintainers of hocr-tools would be interested in your PR?

smijo149 · 2021-04-06T00:58:30Z

@joewiz Yeah I was able to solve the issue based on @stweil suggestion. I have opened a PR #165 if anyone is interested. Thanks!

stweil added bug enhancement labels Feb 2, 2021

smijo149 mentioned this issue Feb 5, 2021

Fixed the way right to left language (Hebrew) text is inserted to pdf StarfishCo/hocr-tools#8

Merged

smijo149 mentioned this issue Apr 6, 2021

Added support for right to left (R2L) languages using bidi algorithm #165

Merged

Provide feedback