Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for RegExp.leftContext RegExp.rightContext #923

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ammarahm-ed
Copy link
Contributor

@ammarahm-ed ammarahm-ed commented Feb 20, 2025

This pull request to quickjs.c introduces several changes to enhance the handling of regular expression contexts within the JSContext structure. The most important changes include adding new fields to store RegExp context information, initializing these fields in context creation, freeing them in context destruction, and implementing new functions to manage these contexts.

Enhancements to RegExp context handling:

  • quickjs.c: Added new fields regexp_left_ctx, regexp_right_ctx, regexp_last_match_str, regexp_last_match_start, and regexp_last_match_end to the JSContext structure to store RegExp context information.
  • quickjs.c: Initialized the new RegExp context fields in the JS_NewContextRaw function.
  • quickjs.c: Freed the new RegExp context fields in the JS_FreeContext function to prevent memory leaks.
  • quickjs.c: Implemented new functions js_regexp_get_leftContext and js_regexp_get_rightContext to retrieve and manage the left and right context of the last RegExp match.
  • quickjs.c: Updated the js_regexp_exec function to store the last match string and its start and end positions in the new context fields.
// Define a string to test
const testString = "Hello, world!";

// Create a regular expression to match the word "world"
const regex = /world/;

// Execute the regular expression on the test string
const result = regex.exec(testString);

// Check if the match was successful
if (result) {
    // Print the matched result
    console.log("Matched:", result[0]);

    // Print the left context
    console.log("Left Context:", RegExp.leftContext);

    // Print the right context
    console.log("Right Context:", RegExp.rightContext);
} else {
    console.log("No match found.");
}

Closes #921

Copy link
Contributor

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem I see with this approach is that regexp_last_match_str can stay around indefinitely (also applies to the substrings.)

Imagine you run a regex on a 100 MB string. That string can't be reclaimed until the next time a regexp runs - which may be never.

Comment on lines +412 to +414
JSValue regexp_left_ctx; // RegExp.leftContext
JSValue regexp_right_ctx; // RegExp.rightContext
JSValue regexp_last_match_str;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to mark these in JS_MarkContext, otherwise the GC might free them prematurely.

@@ -43821,6 +43835,24 @@ static JSValue js_regexp_escape(JSContext *ctx, JSValue this_val,
return string_buffer_end(b);
}

static JSValue js_regexp_get_leftContext(JSContext *ctx, JSValue this_val) {
if (ctx->regexp_last_match_start >= 0) {
JS_FreeValue(ctx, ctx->regexp_left_ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
JS_FreeValue(ctx, ctx->regexp_left_ctx);
JS_FreeValue(ctx, ctx->regexp_left_ctx);
ctx->regexp_left_ctx = JS_UNDEFINED;

Avoids double frees, etc. Applies to the other JS_FreeValue calls below too.

static JSValue js_regexp_get_leftContext(JSContext *ctx, JSValue this_val) {
if (ctx->regexp_last_match_start >= 0) {
JS_FreeValue(ctx, ctx->regexp_left_ctx);
ctx->regexp_left_ctx = js_sub_string(ctx, JS_VALUE_GET_STRING(ctx->regexp_last_match_str), 0, ctx->regexp_last_match_start);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a strict line length but try to keep it under 80 columns whenever you can, by doing something like:

JSString *p;
// ...
p = JS_VALUE_GET_STRING(ctx->regexp_last_match_str)
ctx->regexp_left_ctx = js_sub_string(ctx, p, 0, ctx->regexp_last_match_start);

Also, bad things will happen when ctx->regexp_last_match_str is not a string. I realize that's guarded by the ctx->regexp_last_match_start >= 0 check but it seems kind of fragile. Why not:

if (JS_IsString(ctx->regexp_last_match_str)) {
    // ...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RegExp.leftContext unsupported
2 participants