Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_text_from_layout_json throws 'NoneType' object is not subscriptable for a specific PDF #411

Open
neil-sola opened this issue Dec 2, 2024 · 1 comment

Comments

@neil-sola
Copy link

neil-sola commented Dec 2, 2024

get_text_from_layout_json throws 'NoneType' object is not subscriptable for a specific PDF.

Unfortunately, I can't share the specific PDF for privacy reasons — but this line seems to be the cause:

children = [(x, depth + 1) for x in relationships[0]['Ids']]

Might also be an issue with Textract's output itself, rather than this library's parsing. This issue seems isolated to a specific PDF, and other pdfs work fine. Notes: seems to be something related to the metadata / structure of the file itself, multiple runs + changing orientiation + deleting pages does not seem to fix the issue.

Is this an error than anyone else has encountered / figured out a resolution for?

@neil-sola
Copy link
Author

Found the specific issue: it is possible for a LAYOUT_FIGURE to have "Relationships": null which breaks this function:

Example:

{"BlockType":"LAYOUT_FIGURE","ColumnIndex":null,"ColumnSpan":null,"Confidence":94.62890625,"EntityTypes":null,"Geometry":{"BoundingBox":{"Height":0.04673086851835251,"Left":0.06788529455661774,"Top":0.8822278380393982,"Width":0.4904918074607849},"Polygon":[{"X":0.06790152192115784,"Y":0.8822278380393982},{"X":0.5583770871162415,"Y":0.8828750252723694},{"X":0.558368444442749,"Y":0.9289587140083313},{"X":0.06788529455661774,"Y":0.9283040761947632}]},"Hint":null,"Id":"4859938e-4c4a-46bb-b40c-34d93486b824","Page":1,"PageClassification":null,"Query":null,"Relationships":null,"RowIndex":null,"RowSpan":null,"SelectionStatus":null,"Text":null,"TextType":null},

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant