Index compact rules for faster filtering #1035

sammacbeth · 2025-10-10T14:41:39Z

Task/Issue URL: https://app.asana.com/1/137249556945/project/488551667048375/task/1211076317769725?focus=true

Description:

Sorts rules in compact format to allow for fast filtering on the client side.
Adds an index to allow filtering of generic, frame and specific rules independantly.
Adds an implementation of filtering, and enables it in the test extension.
Adds tests for the filtering implementation, to ensure that rules are always included when expected.

Steps to test this PR:

daxtheduck · 2025-10-10T16:11:11Z

CI run finished. Artifacts ZIP for the review tool

daxtheduck · 2025-10-13T09:55:38Z

CI run finished. Artifacts ZIP for the review tool

daxtheduck · 2025-10-13T10:13:58Z

CI run finished. Artifacts ZIP for the review tool

daxtheduck · 2025-10-13T12:55:58Z

CI run finished. Artifacts ZIP for the review tool

daxtheduck · 2025-10-14T01:23:00Z

CI run finished. Artifacts ZIP for the review tool

Copilot

Pull Request Overview

This PR introduces rule filtering functionality to improve performance by indexing and sorting rules in compact format, enabling fast client-side filtering based on context (main frame vs subframe and URL patterns).

Adds indexing to compact rules with separate ranges for generic, frame, and specific rules
Implements a filtering system that selects only relevant rules based on URL and frame context
Updates regex patterns in multiple rule files to be more restrictive and compatible with test URL generation

Reviewed Changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
lib/encoding.ts	Adds IndexedCMPRuleset type, rule sorting logic, and filterCompactRules implementation
tests-wtr/rules/rule-filtering.test.ts	Comprehensive test suite for rule filtering functionality
scripts/generate-rule-test-urls.js	Script to generate test URLs from rule patterns using randexp
rules/autoconsent/*.json	Updates URL patterns to be more restrictive and compatible with regex generation
addon/background.ts	Integrates rule filtering in the background script
build.sh	Copies compact-rules.json to distribution directories
package.json	Adds randexp dependency for URL generation

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-14T09:04:44Z

lib/encoding.ts

+
+    const genericStrings = buildStrings(existingCompactRules?.s || [], rules.slice(0, genericRuleEnd));
+    const frameStrings = buildStrings(genericStrings, rules.slice(0, frameRuleEnd));
+    const strings = buildStrings(frameStrings || [], rules);


Line 187 uses the result of buildStrings (genericStrings) as the first parameter, but line 188 uses frameStrings with a fallback to empty array. This is inconsistent - frameStrings should never be null/undefined since buildStrings always returns an array.

Suggested change

const strings = buildStrings(frameStrings || [], rules);

const strings = buildStrings(frameStrings, rules);

Copilot · 2025-10-14T09:04:45Z

lib/encoding.ts

+    const genericRuleEnd = rules.findIndex((r) => {
+        return r.runContext?.urlPattern && r.runContext.urlPattern !== '';
+    });
+    const frameRuleStart = rules.findIndex((r) => {
+        return r.runContext?.frame === true;
+    });
+    const frameRuleEnd = rules.findIndex((r, i) => {
+        return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart;
+    });


findIndex returns -1 when no element is found, but this value is used directly as an array slice index. If all rules are generic (no urlPattern), genericRuleEnd will be -1, causing incorrect slicing behavior.

Suggested change

const genericRuleEnd = rules.findIndex((r) => {

return r.runContext?.urlPattern && r.runContext.urlPattern !== '';

});

const frameRuleStart = rules.findIndex((r) => {

return r.runContext?.frame === true;

});

const frameRuleEnd = rules.findIndex((r, i) => {

return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart;

});

let genericRuleEnd = rules.findIndex((r) => {

return r.runContext?.urlPattern && r.runContext.urlPattern !== '';

});

if (genericRuleEnd === -1) genericRuleEnd = rules.length;

let frameRuleStart = rules.findIndex((r) => {

return r.runContext?.frame === true;

});

if (frameRuleStart === -1) frameRuleStart = rules.length;

let frameRuleEnd = rules.findIndex((r, i) => {

return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart;

});

if (frameRuleEnd === -1) frameRuleEnd = rules.length;

Copilot · 2025-10-14T09:04:45Z

lib/encoding.ts

+    const frameRuleEnd = rules.findIndex((r, i) => {
+        return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart;
+    });


Similar to genericRuleEnd, frameRuleEnd can be -1 if no matching element is found, which will cause incorrect array slicing when used in slice() operations.

rules/autoconsent/europa-eu.json

muodov · 2025-10-14T07:32:22Z

lib/encoding.ts

+        genericRuleRange: [number, number]; // [startIndex, endIndex] of rules that are generic
+        frameRuleRange: [number, number]; // [startIndex, endIndex] of rules that run in frames
+        specificRuleRange: [number, number]; // [startIndex, endIndex] of rules that are specific
+        genericStringEnd: number; // end index of strings that are used by generic rules
+        frameStringEnd: number; // end index of strings that are used by frame rules


Could you add a comment about the order inside the strings and rules arrays? So it's easier to understand how these numbers correspond to the actual data structure.

Also, as I understand, all "end" indices are +1 (not inclusive on the right side), right? I'd make an explicit comment about that.

muodov · 2025-10-14T07:33:15Z

lib/encoding.ts

    r: CompactCMPRule[];
 };

+export type IndexedCMPRuleset = CompactCMPRuleset & {


Is it necessary to keep CompactCMPRuleset as a separate type?

muodov · 2025-10-14T07:38:02Z

lib/encoding.ts

-    const strings = buildStrings(existingCompactRules?.s || [], rules);
+export function encodeRules(rules: AutoConsentCMPRule[], existingCompactRules: CompactCMPRuleset | null): IndexedCMPRuleset {
+    rules.sort((a, b) => {
+        const isGeneric = (r: AutoConsentCMPRule) => !r.runContext?.urlPattern || r.runContext.urlPattern === '';


this is equivalent, right?

Suggested change

const isGeneric = (r: AutoConsentCMPRule) => !r.runContext?.urlPattern || r.runContext.urlPattern === '';

const isGeneric = (r: AutoConsentCMPRule) => !r.runContext?.urlPattern;

muodov · 2025-10-14T07:39:14Z

lib/encoding.ts


-export function encodeRules(rules: AutoConsentCMPRule[], existingCompactRules: CompactCMPRuleset | null): CompactCMPRuleset {
-    const strings = buildStrings(existingCompactRules?.s || [], rules);
+export function encodeRules(rules: AutoConsentCMPRule[], existingCompactRules: CompactCMPRuleset | null): IndexedCMPRuleset {


add a comment/jsdoc about the expected sorting?

muodov · 2025-10-14T07:41:04Z

lib/encoding.ts

+    });
+
+    const genericRuleEnd = rules.findIndex((r) => {
+        return r.runContext?.urlPattern && r.runContext.urlPattern !== '';


Suggested change

return r.runContext?.urlPattern && r.runContext.urlPattern !== '';

return r.runContext?.urlPattern;

muodov · 2025-10-14T07:46:21Z

lib/encoding.ts

+    const genericRuleEnd = rules.findIndex((r) => {
+        return r.runContext?.urlPattern && r.runContext.urlPattern !== '';
+    });
+    const frameRuleStart = rules.findIndex((r) => {
+        return r.runContext?.frame === true;
+    });
+    const frameRuleEnd = rules.findIndex((r, i) => {


Could you add some tests for edge cases? Such as when there's no frame rules, or no specific rules etc.

muodov · 2025-10-14T08:01:59Z

lib/encoding.ts

+        return r.runContext?.frame === true;
+    });
+    const frameRuleEnd = rules.findIndex((r, i) => {
+        return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart;


this is equivalent, isn't it?

Suggested change

return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart;

return !r.runContext?.frame && i >= frameRuleStart;

muodov · 2025-10-14T08:03:16Z

lib/encoding.ts

+    const genericStrings = buildStrings(existingCompactRules?.s || [], rules.slice(0, genericRuleEnd));
+    const frameStrings = buildStrings(genericStrings, rules.slice(0, frameRuleEnd));
+    const strings = buildStrings(frameStrings || [], rules);


Will this (try to) preserve the indices in site-specific rules?

muodov · 2025-10-14T08:40:02Z

lib/encoding.ts

+}
+
+function shouldRunRuleInContext(rule: CompactCMPRule, mainFrame: boolean, url: string): boolean {
+    const runContext = rule[4];


Let's change this 4 and the numbers below to constants or enum values? I think it'd be very difficult to understand for anyone new to the codebase.

muodov · 2025-10-14T09:07:50Z

tests-wtr/rules/rule-filtering.test.ts

+    testUrls.forEach(([cmp, testUrl]) => {
+        const rule = decodedRules.find((rule) => rule.name === cmp);
+        if (!rule) {
+            // rule from the test list doesn't exist anymore.


Should we regenerate the test urls on each run? Or is it not reliable?

sammacbeth added 4 commits October 7, 2025 14:40

Rule index for efficient filtering

6431984

Use background filtering in test extension

895ac5f

Fixes and improvements

3328587

Add tests for rule filtering

83a4e44

sammacbeth requested a review from muodov October 10, 2025 14:41

sammacbeth added 2 commits October 10, 2025 17:04

Lint fix

8623223

Fix some rule regexes

08bca71

sammacbeth changed the title ~~Indexes compact rules for faster filtering~~ Index compact rules for faster filtering Oct 10, 2025

muodov requested a review from Copilot October 14, 2025 09:03

Copilot AI reviewed Oct 14, 2025

View reviewed changes

muodov requested changes Oct 14, 2025

View reviewed changes

	const strings = buildStrings(frameStrings \|\| [], rules);
	const strings = buildStrings(frameStrings, rules);

	const isGeneric = (r: AutoConsentCMPRule) => !r.runContext?.urlPattern \|\| r.runContext.urlPattern === '';
	const isGeneric = (r: AutoConsentCMPRule) => !r.runContext?.urlPattern;

	return r.runContext?.urlPattern && r.runContext.urlPattern !== '';
	return r.runContext?.urlPattern;

	return (!r.runContext \|\| !r.runContext?.frame) && i >= frameRuleStart;
	return !r.runContext?.frame && i >= frameRuleStart;

Index compact rules for faster filtering #1035

Are you sure you want to change the base?

Index compact rules for faster filtering #1035

Uh oh!

Conversation

sammacbeth commented Oct 10, 2025 • edited by joshliebe Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description:

Steps to test this PR:

Uh oh!

daxtheduck commented Oct 10, 2025

Uh oh!

daxtheduck commented Oct 13, 2025

Uh oh!

daxtheduck commented Oct 13, 2025

Uh oh!

daxtheduck commented Oct 13, 2025

Uh oh!

daxtheduck commented Oct 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sammacbeth commented Oct 10, 2025 •

edited by joshliebe

Loading