Skip to content

Commit

Permalink
Fix object coordinates when mediabox != (0,0,...)
Browse files Browse the repository at this point in the history
See issue #1181 for details.
  • Loading branch information
jsvine committed Aug 5, 2024
1 parent 03a477f commit 9025c3f
Show file tree
Hide file tree
Showing 5 changed files with 35 additions and 2 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ All notable changes to this project will be documented in this file. The format

### Fixed

- Fix handling of object coordinates when `mediabox` does not begin at `(0,0)` (h/t @wodny). ([#1181](https://github.com/jsvine/pdfplumber/issues/1181))
- Fix error on getting `.annots`/`.hyperlinks` from `CroppedPage` (due to missing `.rotation` and `.initial_doctop` attributes) (h/t @Safrone). ([#1171](https://github.com/jsvine/pdfplumber/issues/1171) + [e5737d2](https://github.com/jsvine/pdfplumber/commit/e5737d2))
- Fix problem where `Page.crop(...)` was not cropping `.annots/.hyperlinks` (h/t @Safrone). [#1171](https://github.com/jsvine/pdfplumber/issues/1171)
- Fix calculation of coordinates for `.annots` on `CroppedPage`s. ([0bbb340](https://github.com/jsvine/pdfplumber/commit/0bbb340))
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -568,6 +568,7 @@ Many thanks to the following users who've contributed ideas, features, and fixes
- [Aron Weiler](https://github.com/aronweiler)
- [Quentin André](https://github.com/QuentinAndre11)
- [Léo Roux](https://github.com/leorouxx)
- [@wodny](https://github.com/wodny)

## Contributing

Expand Down
13 changes: 11 additions & 2 deletions pdfplumber/page.py
Original file line number Diff line number Diff line change
Expand Up @@ -415,11 +415,20 @@ def process_attr(item: Tuple[str, Any]) -> Optional[Tuple[str, Any]]:

attr["dash"] = obj.dashing_style

# As noted in #1181, `pdfminer.six` adjusts objects'
# coordinates relative to the MediaBox:
# https://github.com/pdfminer/pdfminer.six/blob/1a8bd2f730295b31d6165e4d95fcb5a03793c978/pdfminer/converter.py#L79-L84
mb_x0, mb_top = self.mediabox[:2]

if "y0" in attr:
attr["top"] = self.height - attr["y1"]
attr["bottom"] = self.height - attr["y0"]
attr["top"] = (self.height - attr["y1"]) + mb_top
attr["bottom"] = (self.height - attr["y0"]) + mb_top
attr["doctop"] = self.initial_doctop + attr["top"]

if "x0" in attr and mb_x0 != 0:
attr["x0"] = attr["x0"] + mb_x0
attr["x1"] = attr["x1"] + mb_x0

return attr

def iter_layout_objects(
Expand Down
Binary file added tests/pdfs/issue-1181.pdf
Binary file not shown.
22 changes: 22 additions & 0 deletions tests/test_issues.py
Original file line number Diff line number Diff line change
Expand Up @@ -310,3 +310,25 @@ def test_issue_1147(self):
page = pdf.pages[0]
# Should not error:
assert page.extract_text()

def test_issue_1181(self):
"""
Correctly re-calculate coordinates when MediaBox does not start at (0,0)
"""
path = os.path.join(HERE, "pdfs/issue-1181.pdf")
with pdfplumber.open(path) as pdf:
p0, p1 = pdf.pages
assert p0.crop(p0.bbox).extract_table() == [
["FooCol1", "FooCol2", "FooCol3"],
["Foo4", "Foo5", "Foo6"],
["Foo7", "Foo8", "Foo9"],
["Foo10", "Foo11", "Foo12"],
["", "", ""],
]
assert p1.crop(p1.bbox).extract_table() == [
["BarCol1", "BarCol2", "BarCol3"],
["Bar4", "Bar5", "Bar6"],
["Bar7", "Bar8", "Bar9"],
["Bar10", "Bar11", "Bar12"],
["", "", ""],
]

0 comments on commit 9025c3f

Please sign in to comment.