Skip to content

Releases: aws-samples/amazon-textract-textractor

Version 1.8.5

13 Nov 14:55
Compare
Choose a tag to compare

What's Changed

  • Fix bug in convert that caused an exception on empty pages.

Full Changelog: v1.8.4...v1.8.5

Version 1.8.4

06 Nov 22:29
Compare
Choose a tag to compare

What's Changed

  • Add check for None bounding boxes for AnalyzeExpense by @Belval
  • Allow Custom Separator in Document.export_kv_to_csv() by @Chuukwudi
  • Update analyze_document type hint by @ryangamble
  • Fix invalid escape in BoundingBox docstring by @simonschmidt in #395

Full Changelog: v1.8.3...v1.8.4

Version 1.8.3

21 Aug 16:22
Compare
Choose a tag to compare

What's Changed

  • Id in html output by @Belval in #386
  • Escape html output by @Belval in #387
  • Fix table indexing returning too many cells

⚠️ Breaking changes

  • To support ids in HTML, layout Table created for TABLE predictions will no longer share the same ID as the table.

Full Changelog: v1.8.2...v1.8.3

Version 1.8.2

25 Jun 13:44
Compare
Choose a tag to compare

What's Changed

  • Fix pypdfium2 failing to parse PDFs in bytearray format by @Belval

Full Changelog: v1.8.1...v1.8.2

Version 1.8.1

24 Jun 13:25
Compare
Choose a tag to compare

What's Changed

  • Fix .to_markdown() raising an exception on missing local config by @Belval in #381

Full Changelog: v1.8.0...v1.8.1

Version 1.8.0

21 Jun 00:54
Compare
Choose a tag to compare

What's Changed

  • Improve HTML linearization
    • Add HTML table linearization format that uses merged cells information for colspan and rowspan
    • Add prefix and suffix for LAYOUT_FOOTER and LAYOUT_ENTITY
    • Add <html><body>...</body></html> to the output when calling Document.to_html()
  • Use pypdfium2 for PDF rasterization when available instead of pdf2image. This allows for better portability as the former does not have a dependency on OS libraries and should work out of the box with Lambda and SageMaker.
  • Fix expenses with no summary fields
  • Replace region mismatch with invalid S3 object exception

Backward-incompatible changes

  • This update removes s3_output_path from the synchronous functions as s3_output_path is not a supported parameter for the Textract Synchronous API
  • This update changes the exception raised by the textractor.py functions which will no longer raise RegionMismatchError (which is however kept in textractor.exceptions for backward compatibility.
  • This update removes confidence_score from KeyValue entities in favour of _confidence which is used for all other entities.

Full Changelog: v1.7.12...v1.8.0

Version 1.7.12

23 May 17:15
Compare
Choose a tag to compare

What's Changed

  • Fix issue where tables linearized to plaintext that contained merge cells would duplicate the text over the entire table.

Full Changelog: v1.7.11...v1.7.12

Version 1.7.11

13 May 20:26
Compare
Choose a tag to compare

What's Changed

  • Add figure layout prefix and suffix by @Belval in #362
  • Add confidence scores at the DocumentEntity level by @Belval in #363

Full Changelog: v1.7.10...v1.7.11

Version 1.7.10

19 Apr 02:00
Compare
Choose a tag to compare

What's Changed

  • Use AWS_REGION and AWS_DEFAULT_REGION environment variables in Textractor when available
  • Fix missing figure layouts

Full Changelog: v1.7.9...v1.7.10

Version 1.7.9

22 Mar 14:54
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.7.8...v1.7.9