-
Notifications
You must be signed in to change notification settings - Fork 25
Index compact rules for faster filtering #1035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
CI run finished. Artifacts ZIP for the review tool |
CI run finished. Artifacts ZIP for the review tool |
CI run finished. Artifacts ZIP for the review tool |
CI run finished. Artifacts ZIP for the review tool |
CI run finished. Artifacts ZIP for the review tool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces rule filtering functionality to improve performance by indexing and sorting rules in compact format, enabling fast client-side filtering based on context (main frame vs subframe and URL patterns).
- Adds indexing to compact rules with separate ranges for generic, frame, and specific rules
- Implements a filtering system that selects only relevant rules based on URL and frame context
- Updates regex patterns in multiple rule files to be more restrictive and compatible with test URL generation
Reviewed Changes
Copilot reviewed 15 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
lib/encoding.ts | Adds IndexedCMPRuleset type, rule sorting logic, and filterCompactRules implementation |
tests-wtr/rules/rule-filtering.test.ts | Comprehensive test suite for rule filtering functionality |
scripts/generate-rule-test-urls.js | Script to generate test URLs from rule patterns using randexp |
rules/autoconsent/*.json | Updates URL patterns to be more restrictive and compatible with regex generation |
addon/background.ts | Integrates rule filtering in the background script |
build.sh | Copies compact-rules.json to distribution directories |
package.json | Adds randexp dependency for URL generation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
||
const genericStrings = buildStrings(existingCompactRules?.s || [], rules.slice(0, genericRuleEnd)); | ||
const frameStrings = buildStrings(genericStrings, rules.slice(0, frameRuleEnd)); | ||
const strings = buildStrings(frameStrings || [], rules); |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 187 uses the result of buildStrings (genericStrings) as the first parameter, but line 188 uses frameStrings with a fallback to empty array. This is inconsistent - frameStrings should never be null/undefined since buildStrings always returns an array.
const strings = buildStrings(frameStrings || [], rules); | |
const strings = buildStrings(frameStrings, rules); |
Copilot uses AI. Check for mistakes.
const genericRuleEnd = rules.findIndex((r) => { | ||
return r.runContext?.urlPattern && r.runContext.urlPattern !== ''; | ||
}); | ||
const frameRuleStart = rules.findIndex((r) => { | ||
return r.runContext?.frame === true; | ||
}); | ||
const frameRuleEnd = rules.findIndex((r, i) => { | ||
return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart; | ||
}); |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findIndex returns -1 when no element is found, but this value is used directly as an array slice index. If all rules are generic (no urlPattern), genericRuleEnd will be -1, causing incorrect slicing behavior.
const genericRuleEnd = rules.findIndex((r) => { | |
return r.runContext?.urlPattern && r.runContext.urlPattern !== ''; | |
}); | |
const frameRuleStart = rules.findIndex((r) => { | |
return r.runContext?.frame === true; | |
}); | |
const frameRuleEnd = rules.findIndex((r, i) => { | |
return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart; | |
}); | |
let genericRuleEnd = rules.findIndex((r) => { | |
return r.runContext?.urlPattern && r.runContext.urlPattern !== ''; | |
}); | |
if (genericRuleEnd === -1) genericRuleEnd = rules.length; | |
let frameRuleStart = rules.findIndex((r) => { | |
return r.runContext?.frame === true; | |
}); | |
if (frameRuleStart === -1) frameRuleStart = rules.length; | |
let frameRuleEnd = rules.findIndex((r, i) => { | |
return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart; | |
}); | |
if (frameRuleEnd === -1) frameRuleEnd = rules.length; |
Copilot uses AI. Check for mistakes.
const frameRuleEnd = rules.findIndex((r, i) => { | ||
return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart; | ||
}); |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to genericRuleEnd, frameRuleEnd can be -1 if no matching element is found, which will cause incorrect array slicing when used in slice() operations.
Copilot uses AI. Check for mistakes.
genericRuleRange: [number, number]; // [startIndex, endIndex] of rules that are generic | ||
frameRuleRange: [number, number]; // [startIndex, endIndex] of rules that run in frames | ||
specificRuleRange: [number, number]; // [startIndex, endIndex] of rules that are specific | ||
genericStringEnd: number; // end index of strings that are used by generic rules | ||
frameStringEnd: number; // end index of strings that are used by frame rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment about the order inside the strings and rules arrays? So it's easier to understand how these numbers correspond to the actual data structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, as I understand, all "end" indices are +1 (not inclusive on the right side), right? I'd make an explicit comment about that.
r: CompactCMPRule[]; | ||
}; | ||
|
||
export type IndexedCMPRuleset = CompactCMPRuleset & { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to keep CompactCMPRuleset
as a separate type?
const strings = buildStrings(existingCompactRules?.s || [], rules); | ||
export function encodeRules(rules: AutoConsentCMPRule[], existingCompactRules: CompactCMPRuleset | null): IndexedCMPRuleset { | ||
rules.sort((a, b) => { | ||
const isGeneric = (r: AutoConsentCMPRule) => !r.runContext?.urlPattern || r.runContext.urlPattern === ''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is equivalent, right?
const isGeneric = (r: AutoConsentCMPRule) => !r.runContext?.urlPattern || r.runContext.urlPattern === ''; | |
const isGeneric = (r: AutoConsentCMPRule) => !r.runContext?.urlPattern; |
|
||
export function encodeRules(rules: AutoConsentCMPRule[], existingCompactRules: CompactCMPRuleset | null): CompactCMPRuleset { | ||
const strings = buildStrings(existingCompactRules?.s || [], rules); | ||
export function encodeRules(rules: AutoConsentCMPRule[], existingCompactRules: CompactCMPRuleset | null): IndexedCMPRuleset { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment/jsdoc about the expected sorting?
}); | ||
|
||
const genericRuleEnd = rules.findIndex((r) => { | ||
return r.runContext?.urlPattern && r.runContext.urlPattern !== ''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return r.runContext?.urlPattern && r.runContext.urlPattern !== ''; | |
return r.runContext?.urlPattern; |
const genericRuleEnd = rules.findIndex((r) => { | ||
return r.runContext?.urlPattern && r.runContext.urlPattern !== ''; | ||
}); | ||
const frameRuleStart = rules.findIndex((r) => { | ||
return r.runContext?.frame === true; | ||
}); | ||
const frameRuleEnd = rules.findIndex((r, i) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some tests for edge cases? Such as when there's no frame rules, or no specific rules etc.
return r.runContext?.frame === true; | ||
}); | ||
const frameRuleEnd = rules.findIndex((r, i) => { | ||
return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is equivalent, isn't it?
return (!r.runContext || !r.runContext?.frame) && i >= frameRuleStart; | |
return !r.runContext?.frame && i >= frameRuleStart; |
const genericStrings = buildStrings(existingCompactRules?.s || [], rules.slice(0, genericRuleEnd)); | ||
const frameStrings = buildStrings(genericStrings, rules.slice(0, frameRuleEnd)); | ||
const strings = buildStrings(frameStrings || [], rules); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this (try to) preserve the indices in site-specific rules?
} | ||
|
||
function shouldRunRuleInContext(rule: CompactCMPRule, mainFrame: boolean, url: string): boolean { | ||
const runContext = rule[4]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change this 4 and the numbers below to constants or enum values? I think it'd be very difficult to understand for anyone new to the codebase.
testUrls.forEach(([cmp, testUrl]) => { | ||
const rule = decodedRules.find((rule) => rule.name === cmp); | ||
if (!rule) { | ||
// rule from the test list doesn't exist anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we regenerate the test urls on each run? Or is it not reliable?
Task/Issue URL: https://app.asana.com/1/137249556945/project/488551667048375/task/1211076317769725?focus=true
Description:
Steps to test this PR: