Releases · Unstructured-IO/unstructured

24 Apr 18:42

badGarnet

0.22.23

879e126

0.22.23 Latest

Latest

What's Changed

fix: first table chunk preserve col/row span by @badGarnet in #4343

Full Changelog: 0.22.22...0.22.23

Contributors

badGarnet

Assets 2

20 Apr 18:53

lawrence-u10d

0.22.22

ed76bfe

0.22.22

Security

Replace PyPI opencv wheels with ffmpeg-free builds in Docker image: After uv sync, the Dockerfile now substitutes all PyPI opencv-python variants with a source-built opencv-contrib-python-headless wheel compiled with WITH_FFMPEG=OFF, eliminating 14 bundled ffmpeg CVEs. The contrib-headless variant is a strict superset of the cv2 API (core + contrib modules, no GUI) so a single wheel replaces opencv-python, opencv-python-headless, and opencv-contrib-python.

Assets 2

14 Apr 14:40

badGarnet

0.22.21

3ac4443

0.22.21

What's Changed

feat: add option to skip table chunking by @badGarnet in #4338

Full Changelog: 0.22.20...0.22.21

Contributors

badGarnet

Assets 2

14 Apr 01:24

vladimir-kivi-ds

0.22.20

dfb1653

0.22.20

What's Changed

Fix fixtures update CI to regenerate markdown by @vladimir-kivi-ds in #4332
fix(deps): upgrade vulnerable transitive dependencies [security] by @utic-github-cicd-token-generator[bot] in #4334
feat: add GHA workflow to build opencv wheels without ffmpeg by @lawrence-u10d in #4335
Enable vertical text detection for rotated images by @vladimir-kivi-ds in #4328

New Contributors

@utic-github-cicd-token-generator[bot] made their first contribution in #4334

Full Changelog: 0.22.18...0.22.20

Contributors

vladimir-kivi-ds and lawrence-u10d

Assets 2

13 Apr 22:56

github-actions

opencv-4.12.0.88

d0aa8eb

OpenCV Wheels 4.12.0.88 (no ffmpeg)

OpenCV Python contrib-headless wheels built from source with WITH_FFMPEG=OFF.

These wheels eliminate bundled ffmpeg CVEs present in the stock PyPI wheels.
Built against cgr.dev/chainguard/wolfi-base:latest with Python 3.12.

The contrib-headless variant provides the full cv2 API (core + contrib
modules, no GUI), so a single wheel can satisfy opencv-python,
opencv-python-headless, opencv-contrib-python, and
opencv-contrib-python-headless in downstream Dockerfiles.

Source version: opencv-contrib-python-headless==4.12.0.88
Build flags: CMAKE_ARGS='-DWITH_FFMPEG=OFF' ENABLE_CONTRIB=1 ENABLE_HEADLESS=1

Assets 4

08 Apr 14:02

badGarnet

0.22.18

d299095

0.22.18

What's Changed

fix(chunking): preserve semantic headers in carried table chunks by @cragwolfe in #4313
feat: add page number support to v1 html partition by @badGarnet in #4327

Full Changelog: 0.22.16...0.22.18

Contributors

badGarnet and cragwolfe

Assets 2

03 Apr 20:44

cragwolfe

0.22.16

264d569

0.22.16

Enhancements

Formula markdown export (element_to_md / elements_to_md): New keyword-only formula_markdown_style ("auto", "display_math", "plain"; default "auto"). In "auto", display math ($$ ... $$) is used only when the text looks like notation (heuristic score) and contains no $/$$ (avoids breaking Markdown and noisy OCR captions). "display_math" wraps whenever safe (still falls back to plain if $ would corrupt fences). "plain" emits text only. Optional normalize_formula (default True) maps common Unicode operators to LaTeX-like tokens; normalize_formula stays before keyword-only options so positional encoding / no_group_by_page callers are unchanged. Unicode √ is never mapped to \\sqrt{}. Module constants: FORMULA_MARKDOWN_AUTO, FORMULA_MARKDOWN_DISPLAY_MATH, FORMULA_MARKDOWN_PLAIN.

0.22.15

Security

security: fix(deps): upgrade vulnerable transitive dependencies [security]

0.22.14

Enhancements

Deduplicate PDF rendering: Remove _render_pdf_pages and delegate to unstructured-inference's convert_pdf_to_image (which already has lazy per-page rendering). Peak memory for path_only=True drops from O(n_pages) to O(1 page) — 97% reduction on a 100-page PDF. Bumps inference dep to >=1.6.2.

0.22.13

Enhancements

Speed up standardize_quotes: Replace loop-based character replacement with a single str.translate() call using a pre-computed translation table. Also fixes a pre-existing bug where left smart quotes were never normalized due to duplicate dictionary keys.

Assets 2

02 Apr 16:27

qued

0.22.12

6ada488

0.22.12

What's Changed

mem: exclude unused spaCy pipeline components to reduce model memory by @KRRT7 in #4296
fix: pdfminer drops extractable text by @qued in #4310

Full Changelog: 0.22.10...0.22.12

Contributors

qued and KRRT7

Assets 2

31 Mar 15:50

badGarnet

0.22.10

b6cf510

0.22.10

What's Changed

fix(chunking): preserve nested table structure in reconstruction by @cragwolfe in #4301
Replace lazyproperty with functools.cached_property by @KRRT7 in #4282
mem: reduce PaddleOCR rec_batch_num from 6 to 1 by @KRRT7 in #4295
fix: isolate Table elements in pre-chunks by @claytonlin1110 in #4307
feat(chunking): repeat table headers on continuation chunks by @cragwolfe in #4298

Full Changelog: 0.22.6...0.22.10

Contributors

cragwolfe, KRRT7, and claytonlin1110

Assets 2

26 Mar 21:20

vladimir-kivi-ds

0.22.6

b0e86a4

0.22.6

What's Changed

fix(deps): Update security updates [SECURITY] by @utic-renovate[bot] in #4303
fix: Self-contained script for version extraction in release CI by @vladimir-kivi-ds in #4304

Full Changelog: 0.22.4...0.22.6

Contributors

vladimir-kivi-ds

Assets 2

Releases: Unstructured-IO/unstructured

0.22.23

What's Changed

Contributors

Uh oh!

0.22.22

Security

Uh oh!

0.22.21

What's Changed

Contributors

Uh oh!

0.22.20

What's Changed

New Contributors

Contributors

Uh oh!

OpenCV Wheels 4.12.0.88 (no ffmpeg)

Uh oh!

0.22.18

What's Changed

Contributors

Uh oh!

0.22.16

0.22.16

Enhancements

0.22.15

Security

0.22.14

Enhancements

0.22.13

Enhancements

Uh oh!

0.22.12

What's Changed

Contributors

Uh oh!

0.22.10

What's Changed

Contributors

Uh oh!

0.22.6

What's Changed

Contributors

Uh oh!