Bad encoding of Hindi Text #503

nikkiBot · 2024-02-07T07:42:44Z

I have a PDF that I wish to extract the table from. The package worked perfectly on most of the pdfs on which I used it before. But this time, I'm getting gibberish in English instead of Hindi Text.
Note that Dependencies are properly installed and that wouldn't be the issue here. This is what I'm doing:
pdf = "./Pradhanjee.pdf"
table = camelot.read_pdf(pdf, pages="all",flavor='lattice')
df = []
for i in range(len(table)):
df.append(table[i].df)
new_df = pd.DataFrame()
for i in range(len(df)):
new_df = pd.concat([new_df, df[i]], axis=0)
new_df.to_excel(f"{title}.xlsx", index=False)
new_df

I'm not sure why this is happening. Any help would be appreciated :')

The text was updated successfully, but these errors were encountered:

Provide feedback