-
Notifications
You must be signed in to change notification settings - Fork 31
Safari Performance Test integration + Cleanup #2202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Integrates Safari performance testing into the internal PerformanceTest tool, adds adaptive iterations, richer error handling, and UI to compare DuckDuckGo vs Safari results.
- Adds a Node/WebDriver-based Safari runner with adaptive iteration logic and temp runtime setup
- Updates SwiftUI views and view models to display results, comparison metrics, and export to JSON
- Refactors error handling and parsing, and extends tests for error cases
Reviewed Changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 7 comments.
Show a summary per file
File | Description |
---|---|
macOS/LocalPackages/PerformanceTest/Tests/PerformanceTestTests/SafariPerformanceTestViewModelTests.swift | Updates tests to new properties and adds error case assertions for Safari runner |
macOS/LocalPackages/PerformanceTest/Tests/PerformanceTestTests/Mocks/MockSafariTestExecutor.swift | Mock executor now writes a mock JSON result to disk for parsing |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/UI/SafariPerformanceTestWindowView.swift | Replaces completion path with a detailed results view using parsed metrics |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/UI/SafariPerformanceTestViewModel.swift | Refactors test execution to parse JSON results, unifies error handling, and adds stat view selection |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/UI/PerformanceTestWindowView.swift | Adds comparison UI (DDG vs Safari), error banners, and export action |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/UI/PerformanceTestWindowController.swift | Injects tab lifecycle callbacks to create/close clean tabs during tests |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/UI/PerformanceTestViewModel.swift | Orchestrates end-to-end comparison run, including overlays, tab lifecycle, logging, and JSON export |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/UI/PerformanceTestConstants.swift | Updates defaults (iterations, timeout) and adds new stat view keys (IQR) |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/UI/ComparisonMetricBox.swift | New UI component showing side-by-side metrics and significance indication |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/SafariTestRunner/src/services/WebDriverService.js | Improves Safari WebDriver setup and error guidance |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/SafariTestRunner/src/core/TestExecutor.js | Adds warmup navigation and waiting for readiness |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/SafariTestRunner/src/core/PerformanceTestRunner.js | Implements adaptive iterations with consistency checks (IQR/median and P95/P50) |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/SafariTestRunner/src/core/Configuration.js | Adds maxIterations and extends default timeout |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/Models/TestResult.swift | Removes unused display/computed helpers |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/Models/PerformanceMetrics.swift | Removes computed scoring/formatting helpers |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/Models/DetailedPerformanceMetrics.swift | Removes ancillary computed helpers; retains performance scoring |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/Models/BrowserComparisonResults.swift | New model for comparing browsers, computing differences, significance, and JSON export |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/Core/SitePerformanceTester.swift | Runs DDG tests with cache clearing, warmups, adaptive iterations, and outlier filtering |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/Core/SafariTestRunner.swift | Runs Safari Node runner from a temp env, finds node/npm, parses progress, and captures error output |
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/Core/SafariResultsParser.swift | New parser for Safari JSON results into PerformanceTestResults |
macOS/DuckDuckGo/Menus/MainMenu.swift | Updates menu title to reflect DDG vs Safari test; removes separate Safari-only option |
macOS/DuckDuckGo/MainWindow/MainViewController.swift | Wires performance test window with tab creation/closure for clean runs |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
func testErrorDetails_npmInstallFailed_returnsCorrectMessage() async { | ||
let url = URL(string: "https://example.com")! | ||
let viewModel = SafariPerformanceTestViewModel(url: url, runnerFactory: { url, iterations in | ||
let mock = MockSafariTestExecutor(url: url, iterations: iterations) | ||
mock.shouldThrowError = SafariTestRunner.RunnerError.npmInstallFailed | ||
return mock | ||
}) | ||
|
||
await viewModel.runTest() | ||
|
||
XCTAssertEqual(viewModel.errorMessage, "Failed to install npm dependencies. Check Console.app for details.") | ||
XCTAssertEqual(viewModel.statusText, "Error: npm install failed") |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RunnerError.npmInstallFailed no longer exists; the new cases are npmNotFound and dependenciesInstallFailed(String). Update the test to use .dependenciesInstallFailed with a sample error string, and assert against the new message/status returned by errorDetails(for:). For example, expect errorMessage 'Failed to install test dependencies:\n\n' and statusText 'Error: Dependency installation failed'.
func testErrorDetails_npmInstallFailed_returnsCorrectMessage() async { | |
let url = URL(string: "https://example.com")! | |
let viewModel = SafariPerformanceTestViewModel(url: url, runnerFactory: { url, iterations in | |
let mock = MockSafariTestExecutor(url: url, iterations: iterations) | |
mock.shouldThrowError = SafariTestRunner.RunnerError.npmInstallFailed | |
return mock | |
}) | |
await viewModel.runTest() | |
XCTAssertEqual(viewModel.errorMessage, "Failed to install npm dependencies. Check Console.app for details.") | |
XCTAssertEqual(viewModel.statusText, "Error: npm install failed") | |
func testErrorDetails_dependenciesInstallFailed_returnsCorrectMessage() async { | |
let url = URL(string: "https://example.com")! | |
let sampleErrorOutput = "npm ERR! some error output" | |
let viewModel = SafariPerformanceTestViewModel(url: url, runnerFactory: { url, iterations in | |
let mock = MockSafariTestExecutor(url: url, iterations: iterations) | |
mock.shouldThrowError = SafariTestRunner.RunnerError.dependenciesInstallFailed(sampleErrorOutput) | |
return mock | |
}) | |
await viewModel.runTest() | |
XCTAssertEqual(viewModel.errorMessage, "Failed to install test dependencies:\n\n\(sampleErrorOutput)") | |
XCTAssertEqual(viewModel.statusText, "Error: Dependency installation failed") |
Copilot uses AI. Check for mistakes.
|
||
await viewModel.runTest() | ||
|
||
XCTAssertEqual(viewModel.errorMessage, "npm not found. Please install npm to run Safari performance tests.") |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expected errorMessage doesn't match the ViewModel's mapping. SafariPerformanceTestViewModel now returns 'npm not found. Please install Node.js (which includes npm) to run Safari performance tests.' Update the assertion to match the new text.
XCTAssertEqual(viewModel.errorMessage, "npm not found. Please install npm to run Safari performance tests.") | |
XCTAssertEqual(viewModel.errorMessage, "npm not found. Please install Node.js (which includes npm) to run Safari performance tests.") |
Copilot uses AI. Check for mistakes.
logger.debug("Opening fresh tab with original URL to clean up test state") | ||
guard let newWebView = await createNewTab() else { return } | ||
|
||
webView = newWebView | ||
logger.debug("Created fresh tab, closing old test tab") | ||
await closeTab() |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sequence closes the currently selected tab, which (after createNewTab selects it) is the new tab, leaving the old test tab open. Reverse the order or explicitly target the old tab so you close the test tab and keep the fresh one. Example: call closeTab() first, then createNewTab(selected: true) and update webView.
logger.debug("Opening fresh tab with original URL to clean up test state") | |
guard let newWebView = await createNewTab() else { return } | |
webView = newWebView | |
logger.debug("Created fresh tab, closing old test tab") | |
await closeTab() | |
logger.debug("Closing old test tab before opening fresh tab to clean up test state") | |
await closeTab() | |
guard let newWebView = await createNewTab() else { return } | |
webView = newWebView | |
logger.debug("Created fresh tab after closing old test tab") |
Copilot uses AI. Check for mistakes.
public func timeBasedWinner(_ duckduckgoValue: Double, _ safariValue: Double, ddgStdDev: Double = 0, safariStdDev: Double = 0) -> BrowserWinner { | ||
// Check if the difference is statistically significant | ||
let diff = abs(duckduckgoValue - safariValue) | ||
|
||
// If we have standard deviation data, use it | ||
if ddgStdDev > 0 || safariStdDev > 0 { | ||
// Use proper quadrature for combining standard deviations: sqrt(σ1² + σ2²) | ||
let combinedStdDev = sqrt(ddgStdDev * ddgStdDev + safariStdDev * safariStdDev) | ||
|
||
// Use 1.96 multiplier for 95% confidence interval (two-tailed test) | ||
let confidenceMargin = 1.96 * combinedStdDev | ||
|
||
// Difference must be greater than confidence margin to be significant | ||
if diff < confidenceMargin { | ||
return .tie | ||
} | ||
} else { |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameters passed here are IQR values (from the UI), but the calculation treats them as standard deviations, which misstates significance. Convert IQR to an approximate standard deviation (σ ≈ IQR/1.349) before combining, or change the comparison to use a threshold derived from IQR directly. Apply the same fix in sizeBasedWinner.
Copilot uses AI. Check for mistakes.
loadTimes: loadTimes, | ||
detailedMetrics: collectedMetrics, | ||
failedAttempts: failedAttempts, | ||
iterations: jsonResult.testConfiguration.iterations, |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterations should reflect actual completed iterations (e.g., successfulIterations.count) rather than the configured iterations. Using the configured count can be misleading when adaptive stopping or failures occur.
iterations: jsonResult.testConfiguration.iterations, | |
iterations: successfulIterations.count, |
Copilot uses AI. Check for mistakes.
// Check for consistency metrics | ||
if cleanLine.contains("Consistency:") { | ||
// Extract just the metrics part | ||
if let metricsRange = cleanLine.range(of: "Consistency: ") { | ||
let metrics = String(cleanLine[metricsRange.upperBound...]) | ||
return (nil, metrics) | ||
} | ||
} |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Node runner logs 'Consistency metrics: ...', not 'Consistency:'. Update the substring check to match the actual log prefix so these lines surface cleanly in status without falling back to the full raw line.
Copilot uses AI. Check for mistakes.
progress = Double(iteration) / Double(total) | ||
} |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Progress is being reset to 0 on status-only updates because parseProgressLog emits iteration=0 for non-iteration lines. Only update progress when iteration > 0 to avoid regressions during status messages.
Copilot uses AI. Check for mistakes.
Task/Issue URL: https://app.asana.com/1/137249556945/task/1211159858495831
Tech Design URL:
CC:
Description
Testing Steps
Impact and Risks
N/A Internal only tool
Notes to Reviewer
PR is a big big as I removed plenty of dead code and refactored some of the perf logic
Internal references:
Definition of Done | Engineering Expectations | Tech Design Template
Note
Adds Safari-based performance testing and a DDG vs Safari comparison workflow with adaptive iterations, robust error handling, JSON export, and UI updates.
SafariTestRunner
with temp runtime setup, npm/node discovery, dependency install, stderr capture, detailed/LocalizedError cases, progress parsing, andmaxIterations
support.SitePerformanceTester
to use per-iteration clean tabs, JS warmup, scroll-driven LCP triggers, use webView’s data store, consistency/IQR-based early stop, and richer metric collection/helpers.SafariResultsParser
,BrowserComparisonResults
(with significance/winner logic and JSON export), and metric utilities (IQR/median/outlier filtering).maxIterations
, increase timeouts, add adaptive iterations with consistency checks, warmup navigation, improved logging, and window sizing;WebDriverService
tweaks and guidance for Remote Automation.ComparisonMetricBox
) and updatedPerformanceTestWindowView
to show DDG vs Safari metrics, browser progress, max-iteration picker, error display, and export-to-JSON.PerformanceTestViewModel
orchestrates DDG test, Safari test (via runner), cleanup/new-tab flow; adds progress/status/error handling.MainViewController.testCurrentSitePerformance
now passescreateNewTab
/closeTab
toPerformanceTestWindowController
for clean-tab runs; removes separate Safari-only path.Test Site Performance (DDG vs Safari)
.Written by Cursor Bugbot for commit 5bf4b57. This will update automatically on new commits. Configure here.