Skip to content

Conversation

sammacbeth
Copy link
Collaborator

@sammacbeth sammacbeth commented Oct 10, 2025

Task/Issue URL: https://app.asana.com/1/137249556945/project/488551667048375/task/1211076317769725?focus=true

Description:

  • Sorts rules in compact format to allow for fast filtering on the client side.
  • Adds an index to allow filtering of generic, frame and specific rules independantly.
  • Adds an implementation of filtering, and enables it in the test extension.
  • Adds tests for the filtering implementation, to ensure that rules are always included when expected.

Steps to test this PR:

@sammacbeth sammacbeth requested a review from muodov October 10, 2025 14:41
@sammacbeth sammacbeth changed the title Indexes compact rules for faster filtering Index compact rules for faster filtering Oct 10, 2025
@daxtheduck
Copy link
Collaborator

CI run finished. Artifacts ZIP for the review tool

@daxtheduck
Copy link
Collaborator

CI run finished. Artifacts ZIP for the review tool

@daxtheduck
Copy link
Collaborator

CI run finished. Artifacts ZIP for the review tool

@daxtheduck
Copy link
Collaborator

CI run finished. Artifacts ZIP for the review tool

@daxtheduck
Copy link
Collaborator

CI run finished. Artifacts ZIP for the review tool

@muodov muodov requested a review from Copilot October 14, 2025 09:03
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces rule filtering functionality to improve performance by indexing and sorting rules in compact format, enabling fast client-side filtering based on context (main frame vs subframe and URL patterns).

  • Adds indexing to compact rules with separate ranges for generic, frame, and specific rules
  • Implements a filtering system that selects only relevant rules based on URL and frame context
  • Updates regex patterns in multiple rule files to be more restrictive and compatible with test URL generation

Reviewed Changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
lib/encoding.ts Adds IndexedCMPRuleset type, rule sorting logic, and filterCompactRules implementation
tests-wtr/rules/rule-filtering.test.ts Comprehensive test suite for rule filtering functionality
scripts/generate-rule-test-urls.js Script to generate test URLs from rule patterns using randexp
rules/autoconsent/*.json Updates URL patterns to be more restrictive and compatible with regex generation
addon/background.ts Integrates rule filtering in the background script
build.sh Copies compact-rules.json to distribution directories
package.json Adds randexp dependency for URL generation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


const genericStrings = buildStrings(existingCompactRules?.s || [], rules.slice(0, genericRuleEnd));
const frameStrings = buildStrings(genericStrings, rules.slice(0, frameRuleEnd));
const strings = buildStrings(frameStrings || [], rules);
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 187 uses the result of buildStrings (genericStrings) as the first parameter, but line 188 uses frameStrings with a fallback to empty array. This is inconsistent - frameStrings should never be null/undefined since buildStrings always returns an array.

Suggested change
const strings = buildStrings(frameStrings || [], rules);
const strings = buildStrings(frameStrings, rules);

Copilot uses AI. Check for mistakes.

Comment on lines +176 to +184
const genericRuleEnd = rules.findIndex((r) => {
return r.runContext?.urlPattern && r.runContext.urlPattern !== '';
});
const frameRuleStart = rules.findIndex((r) => {
return r.runContext?.frame === true;
});
const frameRuleEnd = rules.findIndex((r, i) => {
return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart;
});
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findIndex returns -1 when no element is found, but this value is used directly as an array slice index. If all rules are generic (no urlPattern), genericRuleEnd will be -1, causing incorrect slicing behavior.

Suggested change
const genericRuleEnd = rules.findIndex((r) => {
return r.runContext?.urlPattern && r.runContext.urlPattern !== '';
});
const frameRuleStart = rules.findIndex((r) => {
return r.runContext?.frame === true;
});
const frameRuleEnd = rules.findIndex((r, i) => {
return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart;
});
let genericRuleEnd = rules.findIndex((r) => {
return r.runContext?.urlPattern && r.runContext.urlPattern !== '';
});
if (genericRuleEnd === -1) genericRuleEnd = rules.length;
let frameRuleStart = rules.findIndex((r) => {
return r.runContext?.frame === true;
});
if (frameRuleStart === -1) frameRuleStart = rules.length;
let frameRuleEnd = rules.findIndex((r, i) => {
return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart;
});
if (frameRuleEnd === -1) frameRuleEnd = rules.length;

Copilot uses AI. Check for mistakes.

Comment on lines +182 to +184
const frameRuleEnd = rules.findIndex((r, i) => {
return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart;
});
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to genericRuleEnd, frameRuleEnd can be -1 if no matching element is found, which will cause incorrect array slicing when used in slice() operations.

Copilot uses AI. Check for mistakes.

Comment on lines +48 to +52
genericRuleRange: [number, number]; // [startIndex, endIndex] of rules that are generic
frameRuleRange: [number, number]; // [startIndex, endIndex] of rules that run in frames
specificRuleRange: [number, number]; // [startIndex, endIndex] of rules that are specific
genericStringEnd: number; // end index of strings that are used by generic rules
frameStringEnd: number; // end index of strings that are used by frame rules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment about the order inside the strings and rules arrays? So it's easier to understand how these numbers correspond to the actual data structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as I understand, all "end" indices are +1 (not inclusive on the right side), right? I'd make an explicit comment about that.

r: CompactCMPRule[];
};

export type IndexedCMPRuleset = CompactCMPRuleset & {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to keep CompactCMPRuleset as a separate type?

const strings = buildStrings(existingCompactRules?.s || [], rules);
export function encodeRules(rules: AutoConsentCMPRule[], existingCompactRules: CompactCMPRuleset | null): IndexedCMPRuleset {
rules.sort((a, b) => {
const isGeneric = (r: AutoConsentCMPRule) => !r.runContext?.urlPattern || r.runContext.urlPattern === '';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is equivalent, right?

Suggested change
const isGeneric = (r: AutoConsentCMPRule) => !r.runContext?.urlPattern || r.runContext.urlPattern === '';
const isGeneric = (r: AutoConsentCMPRule) => !r.runContext?.urlPattern;


export function encodeRules(rules: AutoConsentCMPRule[], existingCompactRules: CompactCMPRuleset | null): CompactCMPRuleset {
const strings = buildStrings(existingCompactRules?.s || [], rules);
export function encodeRules(rules: AutoConsentCMPRule[], existingCompactRules: CompactCMPRuleset | null): IndexedCMPRuleset {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment/jsdoc about the expected sorting?

});

const genericRuleEnd = rules.findIndex((r) => {
return r.runContext?.urlPattern && r.runContext.urlPattern !== '';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return r.runContext?.urlPattern && r.runContext.urlPattern !== '';
return r.runContext?.urlPattern;

Comment on lines +176 to +182
const genericRuleEnd = rules.findIndex((r) => {
return r.runContext?.urlPattern && r.runContext.urlPattern !== '';
});
const frameRuleStart = rules.findIndex((r) => {
return r.runContext?.frame === true;
});
const frameRuleEnd = rules.findIndex((r, i) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some tests for edge cases? Such as when there's no frame rules, or no specific rules etc.

return r.runContext?.frame === true;
});
const frameRuleEnd = rules.findIndex((r, i) => {
return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is equivalent, isn't it?

Suggested change
return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart;
return !r.runContext?.frame && i >= frameRuleStart;

Comment on lines +186 to +188
const genericStrings = buildStrings(existingCompactRules?.s || [], rules.slice(0, genericRuleEnd));
const frameStrings = buildStrings(genericStrings, rules.slice(0, frameRuleEnd));
const strings = buildStrings(frameStrings || [], rules);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this (try to) preserve the indices in site-specific rules?

}

function shouldRunRuleInContext(rule: CompactCMPRule, mainFrame: boolean, url: string): boolean {
const runContext = rule[4];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this 4 and the numbers below to constants or enum values? I think it'd be very difficult to understand for anyone new to the codebase.

testUrls.forEach(([cmp, testUrl]) => {
const rule = decodedRules.find((rule) => rule.name === cmp);
if (!rule) {
// rule from the test list doesn't exist anymore.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we regenerate the test urls on each run? Or is it not reliable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants