Textractor doesn't detect the INVOICE_RECEIPT_ID, but the AWS Textract Demo can #408

arsher-b · 2024-11-20T03:19:02Z

Hello, I'm having an issue with the amazon-textract-textractor library. It doesn't detect theINVOICE_RECEIPT_ID, but the AWS Textract Demo can detect it.

Here is the AWS Textract Demo:

amazon-textract-textractor:

Here is the sample code:from textractor import Textractor

extractor = Textractor(profile_name="")

document = extractor.analyze_expense(
file_source="test.jpg",
save_image=False,
)
expense_doc = document.expense_documents[0]
summary_fields = expense_doc.summary_fields
line_field = expense_doc.line_items_groups
print(summary_fields)

Sample Receipt:

Chuukwudi · 2024-12-02T23:51:01Z

I am not sure what the backend implementation is on the textract demo but I have personally noticed that textract async calls produce superior results than the sync equivalent.

Given that your input is just an image / one paged doc. It can be very tempting to call the sync api extractor.analyze_expense() because its quicker and has less overhead. Try using the async extractor.start_expense_analysis() instead and compare your results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Textractor doesn't detect the INVOICE_RECEIPT_ID, but the AWS Textract Demo can #408

Textractor doesn't detect the INVOICE_RECEIPT_ID, but the AWS Textract Demo can #408

arsher-b commented Nov 20, 2024

Chuukwudi commented Dec 2, 2024

Textractor doesn't detect the INVOICE_RECEIPT_ID, but the AWS Textract Demo can #408

Textractor doesn't detect the INVOICE_RECEIPT_ID, but the AWS Textract Demo can #408

Comments

arsher-b commented Nov 20, 2024

Chuukwudi commented Dec 2, 2024