Skip to content

Replace docx2txt with the Rust docx2txt-rs for 64-bit installers#708

Merged
dscho merged 3 commits into
git-for-windows:mainfrom
dscho:docx2txt-rs
Jun 11, 2026
Merged

Replace docx2txt with the Rust docx2txt-rs for 64-bit installers#708
dscho merged 3 commits into
git-for-windows:mainfrom
dscho:docx2txt-rs

Conversation

@dscho

@dscho dscho commented Jun 5, 2026

Copy link
Copy Markdown
Member

Replaces the Perl docx2txt MSYS package with the Rust mingw-w64-docx2txt-rs rewrite (https://packages.msys2.org/packages/mingw-w64-x86_64-docx2txt-rs, source at https://github.com/dscho/docx2txt-rs) for the 64-bit Git for Windows flavors. The new binary produces byte-identical output to the original on every fixture and drops the Perl dependency entirely.

To this end, the docx2txt-rs package is added as a dependency of git-extra; This will cause the (64-bit) Git for Windows SDKs to pull in that package. (32-bit does not matter, we do not include docx2txt in MinGit, and 32-bit docx2txt-rs would not be available anyway.)

@dscho dscho requested a review from rimrul June 5, 2026 18:00
@dscho dscho self-assigned this Jun 5, 2026
@dscho dscho requested a review from mjcheetham June 5, 2026 18:00
@dscho dscho marked this pull request as ready for review June 5, 2026 18:00
@rimrul

rimrul commented Jun 9, 2026

Copy link
Copy Markdown
Member

How does this degrade on Windows 8.1?

@rimrul

rimrul commented Jun 9, 2026

Copy link
Copy Markdown
Member

How does this degrade on Windows 8.1?

It seems to gracefully fall back to the perl script as long as we keep shipping perl.

@dscho

dscho commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

@rimrul thank you for checking! I do not currently have any Windows 8.1 setup to test this...

The gentle downgrade path you verified is actually really good! That way, v2.55 can ship with both the Perl and the Rust version, and users already get the benefit of the latter as long as they're not on EOLed Windows versions ;-)

@dscho

dscho commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

Oh, but that means I should revert 44b1470 for now, right?

dscho added 3 commits June 11, 2026 08:15
The Perl-based `docx2txt` package has been the SDK's `.docx` textconv
helper since Git for Windows began shipping it, but the only consumer
that actually needs it is `astextplain`. The new
`mingw-w64-docx2txt-rs` package
(https://packages.msys2.org/packages/mingw-w64-x86_64-docx2txt-rs) is a
small Rust rewrite that produces byte-identical output to the original
on every fixture, drops the Perl dependency, and is published for
mingw64, ucrt64, clang64 and clangarm64 per the upstream
`mingw_arch=('mingw64' 'ucrt64' 'clang64' 'clangarm64')` declaration
at https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-docx2txt-rs/PKGBUILD.

Declare it as a dependency of the two 64-bit `git-extra` variants we
actually ship, so that the next nightly sync in git-sdk-64 and
git-sdk-arm64 pulls it in automatically, without the SDK repositories
needing their own targeted PRs. The bare `git-extra` MSYS variant and
`package_mingw-w64-i686-git-extra` are deliberately left untouched:
there is no `mingw-w64-i686-docx2txt-rs` upstream and 32-bit
installers are no longer built.

The follow-up commit teaches `astextplain` to prefer the new binary,
falling back to `docx2txt.pl` so the i686 and bare-MSYS variants of
git-extra (which do not gain the new dependency) keep working as long
as the legacy `docx2txt` package is still installed.

`pkgrel` is not bumped manually because the `pkgver()` function in
this PKGBUILD already derives the version from `git rev-list --count`
over the `git-extra/` directory (excluding `git-extra.install`), so
this commit alone will move the auto-derived `pkgver`.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The previous commit added a hard dependency on
`${MINGW_PACKAGE_PREFIX}-docx2txt-rs` for the 64-bit `git-extra`
variants, which means `docx2txt.exe` (the Rust binary, named after the
`docx2txt` Cargo package per
https://github.com/dscho/docx2txt-rs/blob/main/Cargo.toml) is
guaranteed to be on `$PATH` for every modern Git for Windows SDK.

Teach the `.docx` branch of `astextplain` to call it. The new CLI per
https://github.com/dscho/docx2txt-rs/blob/main/README.md reads
exclusively from stdin and writes to stdout (no filename argument and
no `-` sentinel), so the invocation is just `docx2txt.exe <"$1"`.

The single line

	docx2txt.exe <"$1" || docx2txt.pl "$1" - || cat "$1"

uses a layered fallback rather than an `if command -v` guard,
matching the style of the other case branches in this script (e.g.
`odt2txt "$1" || cat "$1"`,
`out=$(antiword -m UTF-8 "$1") && sed ... || cat "$1"`) which all let
a missing helper fail naturally into the next fallback. On a modern
64-bit SDK the first leg always succeeds; on the legacy i686 and
bare-MSYS `git-extra` variants (which do not gain the dependency in
the previous commit) the script falls through to the Perl shim if it
is still installed, and finally `cat`s the raw `.docx` only when both
helpers are missing, exactly matching the prior
`docx2txt.pl ... || cat "$1"` semantics.

The `.exe` suffix is spelled out explicitly so that the lookup never
resolves to the old `/usr/bin/docx2txt` shell wrapper from the legacy
package, whose CLI is completely different (takes a filename, writes
`filename.txt`, no stdin/stdout interface).

`sha256sums[18]` for `astextplain` is updated to match the new file
contents; without this, `makepkg` would refuse to build the package
with an integrity-check failure.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The PKGBUILD's `pkgver()` function derives the version from
`git rev-list --count` over `git-extra/`, skipping commits that only
touch `pkgver=` or 64-hex source hashes. After the previous commits
in this series add the `${MINGW_PACKAGE_PREFIX}-docx2txt-rs`
dependency and rewrite `astextplain`, the derived value moves from
`1.1.693.6dc76c4f4` to `1.1.696.8dd445c32`; commit it so the
post-build "ensure worktree is clean" check in the
`build-packages (git-extra, ...)` CI job does not fail with
"Uncommitted changes after build!" when makepkg writes the new
value back into the PKGBUILD.

This follows the existing project convention seen most recently in
`83dbeadc git-extra: bump pkgrel`.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho

dscho commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Oh, but that means I should revert 44b1470 for now, right?

Done.

@dscho

dscho commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

/deploy git-extra

The i686/x86_64 and the arm64 workflow runs were started.

@dscho

dscho commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

/add relnote feature The diff helper handling Word documents was ported from Perl to Rust.

The workflow run was started

github-actions Bot pushed a commit that referenced this pull request Jun 11, 2026
The diff helper handling Word documents was
[ported](#708) from
Perl to Rust.

Signed-off-by: gitforwindowshelper[bot] <gitforwindowshelper-bot@users.noreply.github.com>
@dscho dscho merged commit ca2dfe1 into git-for-windows:main Jun 11, 2026
14 of 15 checks passed
@dscho dscho deleted the docx2txt-rs branch June 11, 2026 09:18
@dscho

dscho commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Oh, but that means I should revert 44b1470 for now, right?

Done.

For the record, I moved this into a follow-up PR that I intend to merge after Git for Windows v2.55: #713

@rimrul

rimrul commented Jun 11, 2026

Copy link
Copy Markdown
Member

/add relnote feature The diff helper handling Word documents was #708 from Perl to Rust.

Maybe we should mention that we're currently keeping the perl script as a fallback for Windows 8.1.

@dscho

dscho commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

/add relnote feature The diff helper handling Word documents was #708 from Perl to Rust.

Maybe we should mention that we're currently keeping the perl script as a fallback for Windows 8.1.

Let's first see whether anyone screams? ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants