Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valid regex does not match #257

Open
ghlecl opened this issue May 27, 2022 · 4 comments
Open

Valid regex does not match #257

ghlecl opened this issue May 27, 2022 · 4 comments
Labels

Comments

@ghlecl
Copy link

ghlecl commented May 27, 2022

Hello. I have the following regex:

#include "ctll/fixed_string.hpp"
#include "ctre.hpp"

static constexpr auto ptv_regex = ctll::fixed_string(
    "^(\\d+)(?:[vV](\\d+))?[Pp][Tt][Vv](?:(\\d+)|(\\d+)([AaBbCc]?))?(!)?(?:$|\\^(.+))"
);

which I try to match using:

auto found = ctre::search<ptv_regex>( "1v2PTV1a" );

On regex101.com, this regex matches for the python, PCRE and PCRE2 regex "engines", so I believe the regex is valid and OK. With the CTRE library (version 3.6 from vcpkg and that which is available as trunk on compiler explorer today), it does not match.

Sorry that I don't have the knowledge to try and pinpoint the problem. Just thought I would report it.

@Alcaro
Copy link

Alcaro commented May 27, 2022

Yep, here's a bug. I can reduce it to

#include "ctll/fixed_string.hpp"
#include "ctre.hpp"

static constexpr auto ptv_regex = ctll::fixed_string(
    "(?:a|ab)?"
);

bool x()
{
    auto found = ctre::match<ptv_regex>( "ab" );
    return found;
}

https://godbolt.org/z/jqn9vrrWc

@hanickadot
Copy link
Owner

Thanks for the minimisation, it seems to me there is a deeper problem, and I'm currently don't know how to fix it. Will try to look at it soon.

@hanickadot hanickadot added the bug label May 27, 2022
@hanickadot
Copy link
Owner

It seems to related when you have a common prefix in selection inside a cycle.

@iulian-rusu
Copy link

I am not an expert on how this library implements matching, but it seems to me that there are two factors:

  1. Alternations of the form a|ab will always try to match a before ab.
  2. Cycles are implemented by repeatedly calling evaluate on the content of the cycle:
while (less_than_or_infinite<B>(i)) {
    auto inner_result = evaluate(..., ctll::list<Content..., end_cycle_mark>());
    // ... match Tail if inner_result matches
}

The problem is that, at some point, the expression a|ab has to figure that it needs to match ab immediately instead of a, even if it can only match the prefix a. This cannot happen if the content of the cycle is matched as ctll::list<Content..., end_cycle_mark>, because Content has no idea that Tail comes after it and that it might need to backtrack based on Tail's ability to match. The evaluation of Content thinks the regex ends at end_cycle_mark.

I am writing this because I have also implemented compile-time regular expression matching as a hobby-project (inspired by this library, of course) and I have come across the same bug in my code. The solution that i have found involves using a callback function that calls evaluate for the rest of the expression and passing this function to the evaluation method for Content, so that, after matching Content, the same evaluate method can use the callback to check if Tail matches. It's basically a continuation-passing style approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants