Replies: 4 comments 1 reply
-
Hello! |
Beta Was this translation helpful? Give feedback.
-
i use the pdfcpu.ExtractPageContent(...) method the get the content in my program. if you look at the snippets above, there should be the texts "31." "August" "2023", but i get "\000\026\000\024\000\021" "\000$\000X\000J\000X\000V\000W" "\000\025\000\023\000\025\000\026" |
Beta Was this translation helpful? Give feedback.
-
The command returns the raw page content in PDF syntax. |
Beta Was this translation helpful? Give feedback.
-
Sorry not at the moment, |
Beta Was this translation helpful? Give feedback.
-
Hi,
i wanted to extract raw content of some PDFs i got and for older ones its working just fine,
but i have newer ones where the text is not normal readable ASCII, instead another format
which i couldnt really find out what it is.
Older ones content looks like:
BT /F0 10.5 Tf 1 0 0 1 375.591 558.63 Tm (31.)Tj 1 0 0 1 393.08499 558.63 Tm (Mai)Tj 1 0 0 1 412.91 558.63 Tm (2023)Tj
Newer ones look like this (PDFs are similar and its the same position)
BT /F0 10.5 Tf 1 0 0 1 375.591 558.63 Tm (\000\026\000\024\000\021)Tj 1 0 0 1 393.08499 558.63 Tm (\000$\000X\000J\000X\000V\000W)Tj 1 0 0 1 428.66 558.63 Tm (\000\025\000\023\000\025\000\026)Tj
Does anybody know how i decode this stuff?
Beta Was this translation helpful? Give feedback.
All reactions