Replace docx2txt with the Rust docx2txt-rs for 64-bit installers#708
Conversation
|
How does this degrade on Windows 8.1? |
It seems to gracefully fall back to the |
|
@rimrul thank you for checking! I do not currently have any Windows 8.1 setup to test this... The gentle downgrade path you verified is actually really good! That way, v2.55 can ship with both the Perl and the Rust version, and users already get the benefit of the latter as long as they're not on EOLed Windows versions ;-) |
|
Oh, but that means I should revert 44b1470 for now, right? |
The Perl-based `docx2txt` package has been the SDK's `.docx` textconv helper since Git for Windows began shipping it, but the only consumer that actually needs it is `astextplain`. The new `mingw-w64-docx2txt-rs` package (https://packages.msys2.org/packages/mingw-w64-x86_64-docx2txt-rs) is a small Rust rewrite that produces byte-identical output to the original on every fixture, drops the Perl dependency, and is published for mingw64, ucrt64, clang64 and clangarm64 per the upstream `mingw_arch=('mingw64' 'ucrt64' 'clang64' 'clangarm64')` declaration at https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-docx2txt-rs/PKGBUILD. Declare it as a dependency of the two 64-bit `git-extra` variants we actually ship, so that the next nightly sync in git-sdk-64 and git-sdk-arm64 pulls it in automatically, without the SDK repositories needing their own targeted PRs. The bare `git-extra` MSYS variant and `package_mingw-w64-i686-git-extra` are deliberately left untouched: there is no `mingw-w64-i686-docx2txt-rs` upstream and 32-bit installers are no longer built. The follow-up commit teaches `astextplain` to prefer the new binary, falling back to `docx2txt.pl` so the i686 and bare-MSYS variants of git-extra (which do not gain the new dependency) keep working as long as the legacy `docx2txt` package is still installed. `pkgrel` is not bumped manually because the `pkgver()` function in this PKGBUILD already derives the version from `git rev-list --count` over the `git-extra/` directory (excluding `git-extra.install`), so this commit alone will move the auto-derived `pkgver`. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The previous commit added a hard dependency on
`${MINGW_PACKAGE_PREFIX}-docx2txt-rs` for the 64-bit `git-extra`
variants, which means `docx2txt.exe` (the Rust binary, named after the
`docx2txt` Cargo package per
https://github.com/dscho/docx2txt-rs/blob/main/Cargo.toml) is
guaranteed to be on `$PATH` for every modern Git for Windows SDK.
Teach the `.docx` branch of `astextplain` to call it. The new CLI per
https://github.com/dscho/docx2txt-rs/blob/main/README.md reads
exclusively from stdin and writes to stdout (no filename argument and
no `-` sentinel), so the invocation is just `docx2txt.exe <"$1"`.
The single line
docx2txt.exe <"$1" || docx2txt.pl "$1" - || cat "$1"
uses a layered fallback rather than an `if command -v` guard,
matching the style of the other case branches in this script (e.g.
`odt2txt "$1" || cat "$1"`,
`out=$(antiword -m UTF-8 "$1") && sed ... || cat "$1"`) which all let
a missing helper fail naturally into the next fallback. On a modern
64-bit SDK the first leg always succeeds; on the legacy i686 and
bare-MSYS `git-extra` variants (which do not gain the dependency in
the previous commit) the script falls through to the Perl shim if it
is still installed, and finally `cat`s the raw `.docx` only when both
helpers are missing, exactly matching the prior
`docx2txt.pl ... || cat "$1"` semantics.
The `.exe` suffix is spelled out explicitly so that the lookup never
resolves to the old `/usr/bin/docx2txt` shell wrapper from the legacy
package, whose CLI is completely different (takes a filename, writes
`filename.txt`, no stdin/stdout interface).
`sha256sums[18]` for `astextplain` is updated to match the new file
contents; without this, `makepkg` would refuse to build the package
with an integrity-check failure.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The PKGBUILD's `pkgver()` function derives the version from
`git rev-list --count` over `git-extra/`, skipping commits that only
touch `pkgver=` or 64-hex source hashes. After the previous commits
in this series add the `${MINGW_PACKAGE_PREFIX}-docx2txt-rs`
dependency and rewrite `astextplain`, the derived value moves from
`1.1.693.6dc76c4f4` to `1.1.696.8dd445c32`; commit it so the
post-build "ensure worktree is clean" check in the
`build-packages (git-extra, ...)` CI job does not fail with
"Uncommitted changes after build!" when makepkg writes the new
value back into the PKGBUILD.
This follows the existing project convention seen most recently in
`83dbeadc git-extra: bump pkgrel`.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Done. |
|
/deploy git-extra The i686/x86_64 and the arm64 workflow runs were started. |
|
/add relnote feature The diff helper handling Word documents was ported from Perl to Rust. The workflow run was started |
The diff helper handling Word documents was [ported](#708) from Perl to Rust. Signed-off-by: gitforwindowshelper[bot] <gitforwindowshelper-bot@users.noreply.github.com>
Maybe we should mention that we're currently keeping the perl script as a fallback for Windows 8.1. |
Let's first see whether anyone screams? ;-) |
Replaces the Perl
docx2txtMSYS package with the Rustmingw-w64-docx2txt-rsrewrite (https://packages.msys2.org/packages/mingw-w64-x86_64-docx2txt-rs, source at https://github.com/dscho/docx2txt-rs) for the 64-bit Git for Windows flavors. The new binary produces byte-identical output to the original on every fixture and drops the Perl dependency entirely.To this end, the
docx2txt-rspackage is added as a dependency ofgit-extra; This will cause the (64-bit) Git for Windows SDKs to pull in that package. (32-bit does not matter, we do not includedocx2txtin MinGit, and 32-bitdocx2txt-rswould not be available anyway.)