Skip to content

Conversation

afterxleep
Copy link
Contributor

@afterxleep afterxleep commented Oct 14, 2025

Task/Issue URL: https://app.asana.com/1/137249556945/task/1211159858495831
Tech Design URL:
CC:

Description

  • Integrates Safari Performance Tests
  • Outputs Safari test execution logs to macOS console
  • Automatically calculates required iterations for best results
  • Allows setting a Maximum number of iterations
  • Executes Safari Runtime from a /tmp directory so NPM can install dependencies
  • Better error handing for Node errors
  • Tries common node and npm paths / Tries to find node installation
  • Better error handling for manual cancellation errors
  • Fixes deallocation crashes, and run all tests in clean tabs

Testing Steps

  1. Make sure you have nodejs and Safari Automation Enabled
  2. Go to a Website
  3. Navigate to Debug Menu > Test Site Performance
  4. Select 10 iterations max so test does not take too long
  5. Confirm both DDG and Safari tests run and results are presented correctly.

Impact and Risks

N/A Internal only tool

Notes to Reviewer

PR is a big big as I removed plenty of dead code and refactored some of the perf logic


Internal references:

Definition of Done | Engineering Expectations | Tech Design Template


Note

Adds Safari-based performance testing and a DDG vs Safari comparison workflow with adaptive iterations, robust error handling, JSON export, and UI updates.

  • Performance Test Core:
    • Safari Runner: Enhances SafariTestRunner with temp runtime setup, npm/node discovery, dependency install, stderr capture, detailed/LocalizedError cases, progress parsing, and maxIterations support.
    • Site Tester: Refactors SitePerformanceTester to use per-iteration clean tabs, JS warmup, scroll-driven LCP triggers, use webView’s data store, consistency/IQR-based early stop, and richer metric collection/helpers.
    • Parsing/Models: Adds SafariResultsParser, BrowserComparisonResults (with significance/winner logic and JSON export), and metric utilities (IQR/median/outlier filtering).
  • JS Runner (SafariTestRunner JS):
    • CLI/Config now accept maxIterations, increase timeouts, add adaptive iterations with consistency checks, warmup navigation, improved logging, and window sizing; WebDriverService tweaks and guidance for Remote Automation.
  • UI:
    • New comparison UI (ComparisonMetricBox) and updated PerformanceTestWindowView to show DDG vs Safari metrics, browser progress, max-iteration picker, error display, and export-to-JSON.
    • PerformanceTestViewModel orchestrates DDG test, Safari test (via runner), cleanup/new-tab flow; adds progress/status/error handling.
  • App Integration:
    • MainViewController.testCurrentSitePerformance now passes createNewTab/closeTab to PerformanceTestWindowController for clean-tab runs; removes separate Safari-only path.
    • Debug menu label updated to Test Site Performance (DDG vs Safari).
  • Tests:
    • Mocks generate JSON results; unit tests updated for new parsing, errors, and properties.

Written by Cursor Bugbot for commit 5bf4b57. This will update automatically on new commits. Configure here.

Copy link

github-actions bot commented Oct 14, 2025

Warnings
⚠️ PR has 2907 lines of added code (excluding Xcode projects and assets). Consider splitting into smaller PRs if possible.

Generated by 🚫 dangerJS against 5bf4b57

@afterxleep afterxleep marked this pull request as ready for review October 15, 2025 13:50
@afterxleep afterxleep changed the title Safari Performance integration Safari Performance Test integration + Cleanup Oct 15, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Integrates Safari performance testing into the internal PerformanceTest tool, adds adaptive iterations, richer error handling, and UI to compare DuckDuckGo vs Safari results.

  • Adds a Node/WebDriver-based Safari runner with adaptive iteration logic and temp runtime setup
  • Updates SwiftUI views and view models to display results, comparison metrics, and export to JSON
  • Refactors error handling and parsing, and extends tests for error cases

Reviewed Changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
macOS/LocalPackages/PerformanceTest/Tests/PerformanceTestTests/SafariPerformanceTestViewModelTests.swift Updates tests to new properties and adds error case assertions for Safari runner
macOS/LocalPackages/PerformanceTest/Tests/PerformanceTestTests/Mocks/MockSafariTestExecutor.swift Mock executor now writes a mock JSON result to disk for parsing
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/UI/SafariPerformanceTestWindowView.swift Replaces completion path with a detailed results view using parsed metrics
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/UI/SafariPerformanceTestViewModel.swift Refactors test execution to parse JSON results, unifies error handling, and adds stat view selection
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/UI/PerformanceTestWindowView.swift Adds comparison UI (DDG vs Safari), error banners, and export action
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/UI/PerformanceTestWindowController.swift Injects tab lifecycle callbacks to create/close clean tabs during tests
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/UI/PerformanceTestViewModel.swift Orchestrates end-to-end comparison run, including overlays, tab lifecycle, logging, and JSON export
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/UI/PerformanceTestConstants.swift Updates defaults (iterations, timeout) and adds new stat view keys (IQR)
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/UI/ComparisonMetricBox.swift New UI component showing side-by-side metrics and significance indication
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/SafariTestRunner/src/services/WebDriverService.js Improves Safari WebDriver setup and error guidance
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/SafariTestRunner/src/core/TestExecutor.js Adds warmup navigation and waiting for readiness
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/SafariTestRunner/src/core/PerformanceTestRunner.js Implements adaptive iterations with consistency checks (IQR/median and P95/P50)
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/SafariTestRunner/src/core/Configuration.js Adds maxIterations and extends default timeout
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/Models/TestResult.swift Removes unused display/computed helpers
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/Models/PerformanceMetrics.swift Removes computed scoring/formatting helpers
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/Models/DetailedPerformanceMetrics.swift Removes ancillary computed helpers; retains performance scoring
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/Models/BrowserComparisonResults.swift New model for comparing browsers, computing differences, significance, and JSON export
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/Core/SitePerformanceTester.swift Runs DDG tests with cache clearing, warmups, adaptive iterations, and outlier filtering
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/Core/SafariTestRunner.swift Runs Safari Node runner from a temp env, finds node/npm, parses progress, and captures error output
macOS/LocalPackages/PerformanceTest/Sources/PerformanceTest/Core/SafariResultsParser.swift New parser for Safari JSON results into PerformanceTestResults
macOS/DuckDuckGo/Menus/MainMenu.swift Updates menu title to reflect DDG vs Safari test; removes separate Safari-only option
macOS/DuckDuckGo/MainWindow/MainViewController.swift Wires performance test window with tab creation/closure for clean runs

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +218 to +229
func testErrorDetails_npmInstallFailed_returnsCorrectMessage() async {
let url = URL(string: "https://example.com")!
let viewModel = SafariPerformanceTestViewModel(url: url, runnerFactory: { url, iterations in
let mock = MockSafariTestExecutor(url: url, iterations: iterations)
mock.shouldThrowError = SafariTestRunner.RunnerError.npmInstallFailed
return mock
})

await viewModel.runTest()

XCTAssertEqual(viewModel.errorMessage, "Failed to install npm dependencies. Check Console.app for details.")
XCTAssertEqual(viewModel.statusText, "Error: npm install failed")
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RunnerError.npmInstallFailed no longer exists; the new cases are npmNotFound and dependenciesInstallFailed(String). Update the test to use .dependenciesInstallFailed with a sample error string, and assert against the new message/status returned by errorDetails(for:). For example, expect errorMessage 'Failed to install test dependencies:\n\n' and statusText 'Error: Dependency installation failed'.

Suggested change
func testErrorDetails_npmInstallFailed_returnsCorrectMessage() async {
let url = URL(string: "https://example.com")!
let viewModel = SafariPerformanceTestViewModel(url: url, runnerFactory: { url, iterations in
let mock = MockSafariTestExecutor(url: url, iterations: iterations)
mock.shouldThrowError = SafariTestRunner.RunnerError.npmInstallFailed
return mock
})
await viewModel.runTest()
XCTAssertEqual(viewModel.errorMessage, "Failed to install npm dependencies. Check Console.app for details.")
XCTAssertEqual(viewModel.statusText, "Error: npm install failed")
func testErrorDetails_dependenciesInstallFailed_returnsCorrectMessage() async {
let url = URL(string: "https://example.com")!
let sampleErrorOutput = "npm ERR! some error output"
let viewModel = SafariPerformanceTestViewModel(url: url, runnerFactory: { url, iterations in
let mock = MockSafariTestExecutor(url: url, iterations: iterations)
mock.shouldThrowError = SafariTestRunner.RunnerError.dependenciesInstallFailed(sampleErrorOutput)
return mock
})
await viewModel.runTest()
XCTAssertEqual(viewModel.errorMessage, "Failed to install test dependencies:\n\n\(sampleErrorOutput)")
XCTAssertEqual(viewModel.statusText, "Error: Dependency installation failed")

Copilot uses AI. Check for mistakes.


await viewModel.runTest()

XCTAssertEqual(viewModel.errorMessage, "npm not found. Please install npm to run Safari performance tests.")
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected errorMessage doesn't match the ViewModel's mapping. SafariPerformanceTestViewModel now returns 'npm not found. Please install Node.js (which includes npm) to run Safari performance tests.' Update the assertion to match the new text.

Suggested change
XCTAssertEqual(viewModel.errorMessage, "npm not found. Please install npm to run Safari performance tests.")
XCTAssertEqual(viewModel.errorMessage, "npm not found. Please install Node.js (which includes npm) to run Safari performance tests.")

Copilot uses AI. Check for mistakes.

Comment on lines +220 to +225
logger.debug("Opening fresh tab with original URL to clean up test state")
guard let newWebView = await createNewTab() else { return }

webView = newWebView
logger.debug("Created fresh tab, closing old test tab")
await closeTab()
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sequence closes the currently selected tab, which (after createNewTab selects it) is the new tab, leaving the old test tab open. Reverse the order or explicitly target the old tab so you close the test tab and keep the fresh one. Example: call closeTab() first, then createNewTab(selected: true) and update webView.

Suggested change
logger.debug("Opening fresh tab with original URL to clean up test state")
guard let newWebView = await createNewTab() else { return }
webView = newWebView
logger.debug("Created fresh tab, closing old test tab")
await closeTab()
logger.debug("Closing old test tab before opening fresh tab to clean up test state")
await closeTab()
guard let newWebView = await createNewTab() else { return }
webView = newWebView
logger.debug("Created fresh tab after closing old test tab")

Copilot uses AI. Check for mistakes.

Comment on lines +59 to +75
public func timeBasedWinner(_ duckduckgoValue: Double, _ safariValue: Double, ddgStdDev: Double = 0, safariStdDev: Double = 0) -> BrowserWinner {
// Check if the difference is statistically significant
let diff = abs(duckduckgoValue - safariValue)

// If we have standard deviation data, use it
if ddgStdDev > 0 || safariStdDev > 0 {
// Use proper quadrature for combining standard deviations: sqrt(σ1² + σ2²)
let combinedStdDev = sqrt(ddgStdDev * ddgStdDev + safariStdDev * safariStdDev)

// Use 1.96 multiplier for 95% confidence interval (two-tailed test)
let confidenceMargin = 1.96 * combinedStdDev

// Difference must be greater than confidence margin to be significant
if diff < confidenceMargin {
return .tie
}
} else {
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameters passed here are IQR values (from the UI), but the calculation treats them as standard deviations, which misstates significance. Convert IQR to an approximate standard deviation (σ ≈ IQR/1.349) before combining, or change the comparison to use a threshold derived from IQR directly. Apply the same fix in sizeBasedWinner.

Copilot uses AI. Check for mistakes.

loadTimes: loadTimes,
detailedMetrics: collectedMetrics,
failedAttempts: failedAttempts,
iterations: jsonResult.testConfiguration.iterations,
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterations should reflect actual completed iterations (e.g., successfulIterations.count) rather than the configured iterations. Using the configured count can be misleading when adaptive stopping or failures occur.

Suggested change
iterations: jsonResult.testConfiguration.iterations,
iterations: successfulIterations.count,

Copilot uses AI. Check for mistakes.

Comment on lines +269 to +276
// Check for consistency metrics
if cleanLine.contains("Consistency:") {
// Extract just the metrics part
if let metricsRange = cleanLine.range(of: "Consistency: ") {
let metrics = String(cleanLine[metricsRange.upperBound...])
return (nil, metrics)
}
}
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Node runner logs 'Consistency metrics: ...', not 'Consistency:'. Update the substring check to match the actual log prefix so these lines surface cleanly in status without falling back to the full raw line.

Copilot uses AI. Check for mistakes.

Comment on lines 116 to 117
progress = Double(iteration) / Double(total)
}
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progress is being reset to 0 on status-only updates because parseProgressLog emits iteration=0 for non-iteration lines. Only update progress when iteration > 0 to avoid regressions during status messages.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant