Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade module re from PCRE to PCRE2 #9299

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

sverker
Copy link
Contributor

@sverker sverker commented Jan 14, 2025

This is done primarily for long term maintenance reasons. The old PCRE is no longer maintained and has been replaced by PCRE2 which is mostly backward compatible with PCRE wrt regular expression syntax.

sverker and others added 11 commits January 14, 2025 15:31
run_pcre_tests.erl:
* Improve handling of rejected options to support multi-char options.
* Add options aftertext, mark, dupnames
* Ignore "MK:" as we don't support return of "marks".
* Ignore option xx (PCRE2_EXTENDED_MORE)
* Skip comments # and \=
* Modernize catch -> try catch
* Fix escaping in expected match result

https://www.pcre.org/current/doc/html/pcre2test.html#SEC17

says "...bytes other than 32-126 are always treated as non-printing
characters and are therefore shown as hex escapes."

And this seems to mean that printable characters are never escaped,
so for example "\x3b" is verbatim and does not mean ";".

* Fix new subject modifier syntax

where the subject string can end with something like \=anchored,notbol

Old	New
\A	anchored
\B	notbol
\>1	offset=1

Todo:
Could ovector= be converted to re:run option {capture, ..}

* Repeated string syntax

Ex: "\[abc]{4}" is "abcabcabcabc"

* Improve repl and skip generation
Perl version 5.34.0 was used.
The default installed on ELX 6 Ubuntu 22.04.5 LTS.
I give up.
This is such a contrived test. I doubt there even is a sane result that
anyone would expect.
Does not compile on PCRE. It says

"an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)"

Difference in PCRE2 10.44 vs perl 5.34.0

Perl matches "axabc" as "a(*COMMIT)b"

PCRE2 does not match.
Does no compile on PCRE. It says

"lookbehind assertion is not fixed length"

Difference in PCRE2 10.44 vs perl 5.34.0

Perl complains on stderr:
"Variable length lookbehind is experimental in regex..."

and returns strange match results.
Same fail with old pcre.
I think our own implementation of option 'global' is to blame.
from commit 2530d5c5fc9056bdf7f63c8e6b9ae198324468ed
in pcre2 repo.

And removed old pcre files.
@sverker sverker added team:VM Assigned to OTP team VM enhancement labels Jan 14, 2025
@sverker sverker self-assigned this Jan 14, 2025
Copy link
Contributor

github-actions bot commented Jan 14, 2025

CT Test Results

    4 files    226 suites   1h 53m 41s ⏱️
3 645 tests 3 552 ✅  93 💤 0 ❌
4 736 runs  4 620 ✅ 116 💤 0 ❌

Results for commit e34c598.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

sverker and others added 9 commits January 14, 2025 16:21
A better solution would be to silence these warnings in particular
somehow:

warning: ‘heapframes_size’ may be used uninitialized in this function
warning: ‘frame_size’ may be used uninitialized in this function
warning: ‘bumpalong_limit’ may be used uninitialized in this function
:
@sverker sverker force-pushed the sverker/re-pcre2/OTP-19431 branch from afa341d to a4beeba Compare January 15, 2025 20:26
* No unnecessary and ugly memcpy of pcre2_code
  (which was done for every re:run call with precompiled regex)

* Return precompiled regex as magic ref
  instead of exposed unsafe raw binary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants