Skip to content

Commit ef4787d

Browse files
Fix not being able to pass boxes flow as None to pdf2txt (pdfminer#479)
* Fix not being able to pass boxes flow as None to pdf2txt * Changes from code review * Update CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>
1 parent f03657e commit ef4787d

File tree

2 files changed

+15
-4
lines changed

2 files changed

+15
-4
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
66
## [Unreleased]
77

88
### Added
9+
10+
- Option to disable boxes flow layout analysis when using pdf2txt ([#479](https://github.com/pdfminer/pdfminer.six/pull/479))
911
- Support for `pathlib.PurePath` in `open_filename` ([#491](https://github.com/pdfminer/pdfminer.six/issues/491))
1012

1113
### Fixed

tools/pdf2txt.py

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,15 @@
1616
(".tag", "tag"))
1717

1818

19+
def float_or_disabled(x):
20+
if x.lower().strip() == "disabled":
21+
return x
22+
try:
23+
x = float(x)
24+
except ValueError:
25+
raise argparse.ArgumentTypeError("invalid float value: {}".format(x))
26+
27+
1928
def extract_text(files=[], outfile='-',
2029
no_laparams=False, all_texts=None, detect_vertical=None,
2130
word_margin=None, char_margin=None, line_margin=None,
@@ -120,14 +129,14 @@ def maketheparser():
120129
"be part of the same paragraph. The margin is specified "
121130
"relative to the height of a line.")
122131
la_params.add_argument(
123-
"--boxes-flow", "-F", type=float, default=0.5,
132+
"--boxes-flow", "-F", type=float_or_disabled, default=0.5,
124133
help="Specifies how much a horizontal and vertical position of a "
125134
"text matters when determining the order of lines. The value "
126135
"should be within the range of -1.0 (only horizontal position "
127136
"matters) to +1.0 (only vertical position matters). You can also "
128-
"pass `None` to disable advanced layout analysis, and instead "
129-
"return text based on the position of the bottom left corner of "
130-
"the text box.")
137+
"pass `disabled` to disable advanced layout analysis, and "
138+
"instead return text based on the position of the bottom left "
139+
"corner of the text box.")
131140
la_params.add_argument(
132141
"--all-texts", "-A", default=False, action="store_true",
133142
help="If layout analysis should be performed on text in figures.")

0 commit comments

Comments
 (0)