Releases: jsvine/pdfplumber
Releases · jsvine/pdfplumber
v0.5.3
v0.5.2
Added
- Access to
curve
points. (E.g.,page.curves[0]["points"]
.) - Ability for
.draw_line
to drawcurve
points.
Changed
- Disaggregated "min_words_vertical" (default: 3) and "min_words_horizontal" (default: 1), removing "text_word_threshold".
- Internally, made
utils.decimalize
a bit more robust; now throws errors on non-decimalizable items. - Now explicitly ignoring some (obscure)
pdfminer
object attributes. - Raw input for
.draw_line
from a bounding box to((x, y), (x, y))
, for consistency withcurve["points"]
and withPillow
's underlying method.
Fixed
- Fixed typo bug when
.rect_edges
is called before.edges
v0.5.1
Added
- Quick-draw
PageImage
methods:.draw_vline
,.draw_vlines
,.draw_hline
, and.draw_hlines
. - Boolean parameter
keep_blank_chars
for.extract_words(...)
andTableFinder
settings.
Changed
- Increased default
text_tolerance
andintersection_tolerance
TableFinder values from 1 to 3.
Fixed
- Properly handle conversion of PDFs with transparency to
pillow
images. - Properly handle
pandas
DataFrames as inputs to multi-draw commands (e.g.,PageImage.draw_rects(...)
).
v0.5.0
v0.4.0
- Adds
Page.extract_words(...)
, inspired by @jsfenfen's coalesce_words.py - Adds
Page.filter(...)
- Adds height/width properties to
CroppedPage
- Shifts idiom from
.from_path
to.open
, and makesPDF
class compatible withwith
statements. - Fixes a memory leak (caused by misuse of
atexit
)
v0.3.1
v0.3.0
A ton of improvements and new features:
- Shifts to a lazy-loading paradigm, so that you don't have to process an entire PDF just to access one page.
- Strips out
pandas
requirement and usage.- Results in a 3x-ish speedup for
within_bbox
and similar methods, thanks to short-circuiting&
operators.
- Results in a 3x-ish speedup for
- Moves from
float
s toDecimal
s to improve accuracy of equality comparisons. - Moves to a more modular architecture, adds
Container
,Page
, andCroppedPage
classes. - Adds
Page.crop(...)
. - Adds
Page.extract_table(...)
for Tabula-like functionality. - Adds
PDF.metadata
property. - Adds derived properties
Container.rect_edges
andContainer.edges
, decomposing each rectangle decomposed into its constituent lines. - Renames
collate_chars(...)
toget_text(...)
(while retaining a reference to the former). - Enriches the the command-line tool's JSON output to include PDF metadata and page dimensions. [https://github.com//issues/3]