Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Textractor doesn't detect the INVOICE_RECEIPT_ID, but the AWS Textract Demo can #408

Open
arsher-b opened this issue Nov 20, 2024 · 1 comment

Comments

@arsher-b
Copy link

Hello, I'm having an issue with the amazon-textract-textractor library. It doesn't detect theINVOICE_RECEIPT_ID, but the AWS Textract Demo can detect it.

Here is the AWS Textract Demo:
Screenshot 2024-11-20 at 11 06 06 AM

amazon-textract-textractor:
Screenshot 2024-11-20 at 11 06 49 AM

Here is the sample code:from textractor import Textractor

extractor = Textractor(profile_name="")

document = extractor.analyze_expense(
file_source="test.jpg",
save_image=False,
)
expense_doc = document.expense_documents[0]
summary_fields = expense_doc.summary_fields
line_field = expense_doc.line_items_groups
print(summary_fields)

Sample Receipt:
test

@Chuukwudi
Copy link
Contributor

I am not sure what the backend implementation is on the textract demo but I have personally noticed that textract async calls produce superior results than the sync equivalent.

Given that your input is just an image / one paged doc. It can be very tempting to call the sync api extractor.analyze_expense() because its quicker and has less overhead. Try using the async extractor.start_expense_analysis() instead and compare your results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants