Releases · jsvine/pdfplumber

18 Aug 23:43

jsvine

v0.11.4

e921ea7

v0.11.4

Fixed

Fix one type hint so that it doesn't throw error on Python 3.8 (h/t @andrekeller). (#1184)

Contributors

andrekeller

Assets 2

07 Aug 20:34

jsvine

v0.11.3

e2a707b

v0.11.3 Latest

Latest

Added

Add Table.columns, analogous to Table.rows (h/t @Pk13055). (#1050 + d39302f)
Add Page.extract_words(return_chars=True), mirroring Page.search(..., return_chars=True); if this argument is passed, each word dictionary will include an additional key-value pair: "chars": [char_object, ...] (h/t @cmdlineluser). (#1173 + 1496cbd)
Add pdfplumber.open(unicode_norm="NFC"/"NFD"/"NFKC"/NFKD"), where the values are the four options for Unicode normalization (h/t @petermr + @agusluques). (#905 + 03a477f)

Changed

Change default setting pdfplumber.repair(...) passes to Ghostscript's -dPDFSETTINGS parameter, from prepress to default, and make that setting modifiable via .repair(setting=...), where the value is one of "default", "prepress", "printer", or "ebook" (h/t @Laubeee). (#874 + 48cab3f)

Fixed

Fix handling of object coordinates when mediabox does not begin at (0,0) (h/t @wodny). (#1181 + 9025c3f + 046bd87)
Fix error on getting .annots/.hyperlinks from CroppedPage (due to missing .rotation and .initial_doctop attributes) (h/t @Safrone). (#1171 + e5737d2)
Fix problem where Page.crop(...) was not cropping .annots/.hyperlinks (h/t @Safrone). (#1171 + 22494e8)
Fix calculation of coordinates for .annots on CroppedPages. (0bbb340 + b16acc3)
Dereference structure element attributes (h/t @dhdaines). (#1169 + 3f16180)
Fix Page.get_attr(...) so that it fully resolves references before determining whether the attribute's value is None (h/t @zzhangyun + @mkl-public). (#1176 + c20cd3b)

Contributors

petermr, wodny, and 8 other contributors

Assets 2

06 Jul 21:56

jsvine

v0.11.2

cf67246

v0.11.2

Added

Add extra_attrs parameter to .dedupe_chars(...) to adjust the properties used when deduplicating (h/t @QuentinAndre11). (#1114)

Development Changes

Remove testing for Python 3.8, add testing for Python 3.12. (944eaed)
Upgrade flake8, pytest, and pytest-cov — and add setuptools and py as explicit dev requirements (for Python 3.12).

Contributors

QuentinAndre11

Assets 2

11 Jun 20:36

jsvine

v0.11.1

5a0a8fd

v0.11.1

Fixed

Fix .open(..., repair=True) subprocess args (to avoid stderr being captured) (70534a7)
Fix coordinates of annots on rotated pages (aaa35c9)
Fix handling of PDFDocEncoding failures in decode_text(...)(#1147 + 4daf0aa)
Add .get_textmap.cache_clear() to page.close() (0a26f05)

Assets 2

07 Mar 12:57

jsvine

v0.11.0

53306dc

v0.11.0

Summary: More control over the {left-to-right, right-to-left, top-to-bottom, bottom-to-top} direction that pdfplumber reads/writes text (many thanks to @afriedman412 for the idea and prototype in #1040), plus upgrading to pdfminer.six's latest release (which provides more detailed paths for curves), and some fixes.

Added

Add {line,char}_dir{,rotated,render} params, to provide better support for non–top-to-bottom, left-to-right text (h/t @afriedman412). (850fd45)
Add curve["path"] and curve["dash"], thanks to pdfminer.six upgrade (see below). (1820247)

Changed

Upgrade pdfminer.six from 20221105 to 20231228. (cd2f768)
Change value of in word["direction"] from {1,-1} to {"ltr","rtl","ttb","btt"}. (850fd45)
Deprecate vertical_ttb, horizontal_ltr in favor of char_dir and char_dir_rotated.(850fd45)

Fixed

Fix layout-caching issue caused by 0bfffc2. (#1097 + efca277)
Fix missing ParentTree edge-case. (#1094))

Contributors

afriedman412

Assets 2

10 Feb 23:38

jsvine

v0.10.4

3bb642b

v0.10.4

Added

Add x_tolerance_ratio parameter to extract_text and similar functions, to account for text size when spacing characters (instead of a fixed number of pixels) (h/t @afriedman412). (#1041)
Add support for PDF 1.3 logical structure via Page.structure_tree (h/t @dhdaines). (#963)
Add "gswin64c" as another possible Ghostscript executable in repair.py (h/t @echedey-ls). (#1032)
Re-add Page.close() method, have PDF.close() close all pages as well, and improve relevant documentation (h/t @luketudge). (#1042)
Add force_mediabox parameter to Page.to_image(...). (#1054)

Fixed

Standardize handling of cropbox, fixing various issues with PageImage. (#1054)
Fix Page.get_textmap caching to allow for extra_attrs=[...], by preconverting list kwargs to tuples. (#1030)
Explicitly close pypdfium2.PdfDocument in get_page_image (h/t @dhdaines). (#1090)
In PDFPageAggregatorWithMarkedContent.tag_cur_item, check self.cur_item._objs length before trying to access [-1]. (4f39d03)

Contributors

dhdaines, luketudge, and 2 other contributors

Assets 2

26 Oct 14:08

jsvine

v0.10.3

2e838d1

v0.10.3

Added

Add support for marked-content sequences, represented by mcid and tag attributes on char/rect/line/curve/image objects (h/t @dhdaines). (#961)
Add gs_path argument to pdfplumber.open(...) and pdfplumber.repair(...), to allow passing a custom Ghostscript path to be used for repairing. (#953)

Fixed

Respect use_text_flow in extract_text (h/t @dhdaines). (#983)

Contributors

dhdaines

Assets 2

29 Jul 19:04

jsvine

v0.10.2

f92a687

v0.10.2

Added

Add PDF.path: A Path object for PDFs loaded by passing a path (unless repair=True), and None otherwise. (30a52cb + #948)
Accept Iterable objects for geometry utils (h/t @dhdaines). (53bee23 + #945)

Changed

Use pypdfium2's public (not private) .render(...) method (h/t @mara004). (28f4ebe + #899)

Fixed

Fix .to_image() for ZipExtFiles (h/t @Urbener). (30a52cb + #948)

Contributors

dhdaines, mara004, and Urbener

Assets 2

19 Jul 19:03

jsvine

v0.10.1

90742bd

v0.10.1

A simple release:

Added

Add antialias boolean parameter to Page.to_image(...) and associated methods (h/t @cmdlineluser). (7e28931)

Contributors

cmdlineluser

Assets 2

16 Jul 22:37

jsvine

v0.10.0

00386ad

v0.10.0

Changed

Normalize color representation to tuple[float|int, ...] (#917). (57d51bb)
Replace Wand with pypdfium2 for page.to_image(...). (b049373)

Added

Add pdfplumber.repair(...) and .open(repair=True) (#824). (db6ae97)
Add Page.find_table(...) (#873). (3772af6)
Add quantize=True, colors=256, bits=8 arguments/defaults to PageImage.save(...). (b049373)
Extract and handle patterns + (some) color spaces. (97ca4b0)

Removed

Remove support for Python 3.7 (EOL'ed June 2023). (c9d24d5)
Remove vestigial 'font' and 'name' properties from PDF objects. (6d62054)

Fixed

Fix bug for re-crops that use relative=True (#914). (0de6da9)
Handle use_text_flow more consistently (#912). (b1db5b8)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed

Contributors

Added

Changed

Fixed

Contributors

Added

Development Changes

Contributors

Fixed

Added

Changed

Fixed

Contributors

Added

Fixed

Contributors

Added

Fixed

Contributors

Added

Changed

Fixed

Contributors

Added

Contributors

Changed

Added

Removed

Fixed

Releases: jsvine/pdfplumber

v0.11.4

Fixed

Contributors

v0.11.3

Added

Changed

Fixed

Contributors

v0.11.2

Added

Development Changes

Contributors

v0.11.1

Fixed

v0.11.0

Added

Changed

Fixed

Contributors

v0.10.4

Added

Fixed

Contributors

v0.10.3

Added

Fixed

Contributors

v0.10.2

Added

Changed

Fixed

Contributors

v0.10.1

Added

Contributors

v0.10.0

Changed

Added

Removed

Fixed