Microsoft Excel ingestion #1412

BrunoBosshard · 2023-12-16T07:47:17Z

BrunoBosshard
Dec 16, 2023

When I try to ingest Microsoft Excel (*.xlsx) files, I get this error message:

'utf-8' codec can't decode bytes in position 15-16: invalid continuation byte

The files are in English language, just standard latin characters.

Is there any way around this, maybe a different document loader?

hossein890 · 2024-01-12T14:17:28Z

hossein890
Jan 12, 2024

Any answer?

0 replies

mastnacek · 2024-01-14T16:48:11Z

mastnacek
Jan 14, 2024

I have same problem with docx. Maybe some Microsoft garbage? I modified py script which load docs. If end with error, make line into log and continue with other doc.

1 reply

lolo9538 Feb 12, 2024

Same, I think the try except should be added to the code of the reprository

SuperSonnix71 · 2024-02-25T23:15:54Z

SuperSonnix71
Feb 25, 2024

The error you're seeing suggests that you are decoding or processing a file with the wrong character encoding. In the context of ingesting Microsoft Excel (*.xlsx) files, this error occurs if you’re trying to read the file as if it were a plain text file or using an incorrect method that implicitly assumes a text encoding.

Excel files (.xlsx) are actually a collection of XML files compressed into a single ZIP package. Therefore, trying to read them directly as UTF-8 encoded text files will lead to errors like the one you’re seeing.
Install an use pandas or
openpyxl.

1 reply

TobiasJu Mar 7, 2024

Can you elaborate on pandas or openpyxl?
Do you mean to extract Excel tables into plain text and then ingest them?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Microsoft Excel ingestion #1412

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Microsoft Excel ingestion #1412

BrunoBosshard Dec 16, 2023

Replies: 3 comments · 2 replies

hossein890 Jan 12, 2024

mastnacek Jan 14, 2024

lolo9538 Feb 12, 2024

SuperSonnix71 Feb 25, 2024

TobiasJu Mar 7, 2024

BrunoBosshard
Dec 16, 2023

Replies: 3 comments 2 replies

hossein890
Jan 12, 2024

mastnacek
Jan 14, 2024

SuperSonnix71
Feb 25, 2024