Skip to content

Conversation

@sammacbeth
Copy link
Collaborator

@sammacbeth sammacbeth commented Feb 2, 2026

Adds a Jenkins job that imports pixel data from Clickhouse, runs validation to generate rows for a validation summary table. This can be used for dashboards to measure pixel traffic and validation status, as well as identify issues with specific pixels.


Note

Medium Risk
Introduces new CI automation that reads/writes ClickHouse and changes validation output behavior, so failures or query/ingest issues could impact reporting and job stability. Core validation logic changes are small but the new data pipeline increases operational risk.

Overview
Adds a new scheduled, parameterized Jenkins pipeline (Jenkinsfile.dashboard) that checks out each client repo’s pixel definitions and runs a new scripts/detailedValidation.sh workflow per platform/date.

The workflow fetches aggregated pixels for a specific day from ClickHouse (fetch_pixels_for_day.mjs), runs validation while emitting per-input-row JSONL output (validate_live_pixel.mjs via DETAILED_VALIDATION_OUTPUT_FILE / DISABLE_RESULT_SAVING), and inserts those rows into a new ClickHouse table defined in live_validation_scripts/validation_table.sql.

Improves report usability by adding Grafana deep links to pixel names in Asana owner subtasks (asana_reports.mjs) and by extending LivePixelsValidator.validatePixel() to always include matched prefix (and pixelMatch) in returned state.

Written by Cursor Bugbot for commit 43aa502. This will update automatically on new commits. Configure here.

@sammacbeth sammacbeth requested a review from ladamski February 2, 2026 10:41
currentBuild.description = "Import: ${env.DATE_TO_USE}"
}
dir('pixel-schema') {
checkout([$class: 'GitSCM', branches: [[name: "sam/dashboard"]],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jenkins job checks out developer feature branch

High Severity

The pixel-schema repository checkout uses branch sam/dashboard which appears to be a personal feature branch rather than an official branch. All other repos in this Jenkinsfile correctly use main or develop. This means the production Jenkins job would be pulling from an unstable development branch instead of the main codebase.


Please tell me if this was useful or not with a 👍 or 👎.

Fix in Cursor Fix in Web

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix this just before merging so as not to break the current job.

)
ENGINE = ReplicatedReplacingMergeTree()
PARTITION BY (date)
ORDER BY (agent, prefix, params)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClickHouse table ORDER BY causes data deduplication

Medium Severity

The ReplicatedReplacingMergeTree table has ORDER BY (agent, prefix, params) but stores distinct pixel, pixel_id, and version values in separate columns. Since these columns aren't in the ORDER BY, rows with different pixels that share the same agent, prefix, and params will be deduplicated during ClickHouse merges, causing silent data loss including loss of freq aggregations.


Please tell me if this was useful or not with a 👍 or 👎.

Fix in Cursor Fix in Web

userRemoteConfigs: [[url: 'https://github.com/duckduckgo/duckduckgo-privacy-extension.git']]])
}
dir('pixel-schema') {
sh "/bin/bash scripts/detailedValidation.sh ../duckduckgo-privacy-extension/pixel-definitions/ ${env.DATE_TO_USE}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command injection via unvalidated DATE parameter

High Severity

The DATE parameter is used directly in shell commands without validation. A malicious value like 2024-01-01; malicious_command would be interpreted by the shell, executing arbitrary commands on the Jenkins agent. The parameter description suggests YYYY-MM-DD format but doesn't enforce it, allowing command injection via ${env.DATE_TO_USE} in sh steps.


Please tell me if this was useful or not with a 👍 or 👎.

Additional Locations (2)

Fix in Cursor Fix in Web

`;
const result = spawnSync('ddg-ro-ch', ['-h', 'clickhouse', '--query', query]);
console.log(result.stderr.toString());
console.log(result.stdout.toString());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing exit code check allows silent query failures

High Severity

The spawnSync call to execute the ClickHouse query doesn't check result.status or result.error. When the query fails (network issues, permission errors, syntax errors), the script logs output but exits with code 0. The shell script's set -e won't catch this, so validation continues with a missing or empty CSV file, producing incorrect or empty results in the dashboard table.

Fix in Cursor Fix in Web

}
const [prefix, pixelMatch] = matchPixel(pixel, this.#compiledPixels);
this.#currentPixelState.prefix = prefix;
this.#currentPixelState.pixelMatch = pixelMatch || '';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused property assignment for pixelMatch

Low Severity

this.#currentPixelState.pixelMatch is assigned but never read anywhere in the codebase. While result.prefix is used in validate_live_pixel.mjs, result.pixelMatch is not accessed by any consumer.

Fix in Cursor Fix in Web

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.


SELECT date, agent, version, pixel_id, pixel, params_fixed AS params, COUNT(*) AS freq
FROM metrics.pixels
WHERE date = '${day}' AND
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection risk in date parameter interpolation

Medium Severity

The day parameter from user input is directly interpolated into the SQL query string at WHERE date = '${day}' without sanitization or parameterization. While the Jenkins job is internal and the Clickhouse client appears read-only (ddg-ro-ch), a user with Jenkins access could provide a malicious DATE parameter to manipulate the query (e.g., 2024-01-01' OR '1'='1).

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant