Releases: aws-samples/amazon-textract-textractor
Releases · aws-samples/amazon-textract-textractor
Version 1.8.5
What's Changed
- Fix bug in convert that caused an exception on empty pages.
Full Changelog: v1.8.4...v1.8.5
Version 1.8.4
What's Changed
- Add check for None bounding boxes for AnalyzeExpense by @Belval
- Allow Custom Separator in
Document.export_kv_to_csv()
by @Chuukwudi - Update analyze_document type hint by @ryangamble
- Fix invalid escape in BoundingBox docstring by @simonschmidt in #395
Full Changelog: v1.8.3...v1.8.4
Version 1.8.3
What's Changed
- Id in html output by @Belval in #386
- Escape html output by @Belval in #387
- Fix table indexing returning too many cells
⚠️ Breaking changes
- To support ids in HTML, layout Table created for TABLE predictions will no longer share the same ID as the table.
Full Changelog: v1.8.2...v1.8.3
Version 1.8.2
What's Changed
- Fix pypdfium2 failing to parse PDFs in bytearray format by @Belval
Full Changelog: v1.8.1...v1.8.2
Version 1.8.1
What's Changed
Full Changelog: v1.8.0...v1.8.1
Version 1.8.0
What's Changed
- Improve HTML linearization
- Add HTML table linearization format that uses merged cells information for
colspan
androwspan
- Add prefix and suffix for
LAYOUT_FOOTER
andLAYOUT_ENTITY
- Add
<html><body>...</body></html>
to the output when callingDocument.to_html()
- Add HTML table linearization format that uses merged cells information for
- Use
pypdfium2
for PDF rasterization when available instead ofpdf2image
. This allows for better portability as the former does not have a dependency on OS libraries and should work out of the box with Lambda and SageMaker. - Fix expenses with no summary fields
- Replace region mismatch with invalid S3 object exception
Backward-incompatible changes
- This update removes
s3_output_path
from the synchronous functions ass3_output_path
is not a supported parameter for the Textract Synchronous API - This update changes the exception raised by the
textractor.py
functions which will no longer raiseRegionMismatchError
(which is however kept intextractor.exceptions
for backward compatibility. - This update removes
confidence_score
fromKeyValue
entities in favour of_confidence
which is used for all other entities.
Full Changelog: v1.7.12...v1.8.0
Version 1.7.12
What's Changed
- Fix issue where tables linearized to plaintext that contained merge cells would duplicate the text over the entire table.
Full Changelog: v1.7.11...v1.7.12
Version 1.7.11
What's Changed
- Add figure layout prefix and suffix by @Belval in #362
- Add confidence scores at the DocumentEntity level by @Belval in #363
Full Changelog: v1.7.10...v1.7.11
Version 1.7.10
What's Changed
- Use AWS_REGION and AWS_DEFAULT_REGION environment variables in Textractor when available
- Fix missing figure layouts
Full Changelog: v1.7.9...v1.7.10