Implemented end-to-end coverage reporting for dev evals #110

valentinpalkovic · 2025-12-11T20:54:09Z

Running a dev-context eval now yields a full coverage report alongside the existing build/test/typecheck outputs. A new Coverage page renders totals and syntax-highlighted, expandable per-file views with line/branch indicators, and the CLI now reports coverage percentages.

Additionally, the test flow is now decomposed into focused helpers (run tests, parse results, write artifacts, compute coverage).

changeset-bot · 2025-12-11T20:54:13Z

⚠️ No Changeset found

Latest commit: 1d75460

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

💥 An error occurred when fetching the changed packages and changesets in this PR

Some errors occurred when validating the changesets config:
The package or glob expression "@storybook/mcp-eval*" is specified in the `ignore` option but it is not found in the project. You may have misspelled the package name or provided an invalid glob expression. Note that glob expressions must be defined according to https://www.npmjs.com/package/micromatch.

pkg-pr-new · 2025-12-11T20:54:39Z

npm i https://pkg.pr.new/storybookjs/mcp/@storybook/addon-mcp@110

npm i https://pkg.pr.new/storybookjs/mcp/@storybook/mcp@110

commit: 1d75460

codecov · 2025-12-11T20:55:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.82%. Comparing base (c2c7920) to head (1d75460).
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #110   +/-   ##
=======================================
  Coverage   87.82%   87.82%           
=======================================
  Files          19       19           
  Lines         427      427           
  Branches      122      122           
=======================================
  Hits          375      375           
  Misses          8        8           
  Partials       44       44

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copilot

Pull request overview

This PR implements comprehensive coverage reporting for dev evaluations, adding full end-to-end coverage tracking with visual presentation. The implementation decomposes the test flow into focused, reusable helpers (run tests, parse results, write artifacts, compute coverage), adds a new Coverage UI page with syntax-highlighted, expandable per-file views with line/branch indicators, and includes CLI reporting of coverage percentages.

Key changes:

Added istanbul-lib-coverage integration for coverage computation
Created modular test pipeline with separate concerns (run, parse, write, coverage)
Added interactive Coverage UI component with syntax highlighting via react-syntax-highlighter
Coverage metrics are conditionally collected only for dev evaluations

Reviewed changes

Copilot reviewed 18 out of 20 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
eval/lib/evaluations/coverage.ts	New module to compute coverage from istanbul data, extract per-file metrics, and write coverage artifacts
eval/lib/evaluations/parse-tests.ts	Extracted test result parsing logic from test-stories.ts into a focused helper
eval/lib/evaluations/run-tests.ts	Extracted test execution logic into a reusable function
eval/lib/evaluations/write-story-artifacts.ts	Extracted artifact writing logic for story results and a11y violations
eval/lib/evaluations/result-types.ts	New shared types for test results, coverage summary, and a11y violations
eval/lib/evaluations/test-stories.ts	Refactored to orchestrate the new modular helpers with conditional coverage collection
eval/lib/context-utils.ts	New utility to identify dev evaluations
eval/templates/result-docs/coverage.tsx	New React component for rendering coverage with syntax highlighting and interactive file expansion
eval/templates/result-docs/summary.tsx	Updated to display coverage metrics in the summary UI
eval/templates/evaluation/results/coverage.mdx	New MDX page to render coverage in Storybook docs
eval/templates/project/package.json	Added eval:test:coverage script for coverage collection
eval/types/istanbul-lib-coverage.d.ts	Type definitions for istanbul-lib-coverage API
eval/types.ts	Added coverage field to EvaluationSummary type
eval/tsconfig.json	Added types directory to includes
eval/package.json	Added istanbul-lib-coverage and react-syntax-highlighter dependencies
eval/eval.ts	Added CLI output for coverage metrics
eval/lib/save/google-sheet.ts	Added coverageLines field to sheets data
eval/README.md	Updated documentation to reflect new coverage metric
pnpm-lock.yaml	Lockfile updates for new dependencies

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

Comments suppressed due to low confidence (1)

eval/lib/evaluations/coverage.ts:87

This use of variable 'normalizedTotal' always evaluates to true.

		if (normalizedTotal) {

eval/lib/evaluations/coverage.ts

eval/templates/result-docs/coverage.tsx

Copilot · 2025-12-11T21:01:36Z

eval/lib/evaluations/test-stories.ts

-			cwd: projectPath,
-		},
-	});
+	await runTests({ projectPath, resultsPath } as ExperimentArgs, testScript);


The cast to ExperimentArgs is potentially unsafe as it only passes projectPath and resultsPath but the ExperimentArgs type requires additional fields (experimentPath, evalPath, verbose, hooks, uploadId, evalName, context, agent). Consider either creating a minimal type for runTests parameters or ensuring all required fields are passed.

eval/eval.ts

eval/package.json

eval/lib/evaluations/coverage.ts

Co-authored-by: Copilot <[email protected]>

…lations.

Copilot

Pull request overview

Copilot reviewed 18 out of 20 changed files in this pull request and generated 6 comments.

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

Comments suppressed due to low confidence (1)

eval/lib/evaluations/coverage.ts:87

This use of variable 'normalizedTotal' always evaluates to true.

		coverage = {

eval/README.md

eval/lib/evaluations/test-stories.ts

eval/templates/evaluation/vitest.config.ts

eval/lib/evaluations/coverage.ts

Copilot · 2025-12-11T21:10:52Z

eval/lib/evaluations/parse-tests.ts

+			storyAssertions[storyId] = {
+				status: assertionResult.status,


The storyAssertions object will only keep the status of the last assertion for each storyId if multiple assertions have the same storyId. This could lead to incorrect test counts if a story has multiple test assertions. Consider whether this is the intended behavior, or if you should aggregate statuses (e.g., mark as failed if any assertion fails) or track all assertions per story.

Suggested change

storyAssertions[storyId] = {

status: assertionResult.status,

// Aggregate statuses: if any assertion fails, mark as failed; otherwise, keep the worst status

const prevStatus = storyAssertions[storyId]?.status;

let newStatus = assertionResult.status;

if (prevStatus) {

// Priority: failed > todo > skipped > passed

const statusPriority = { failed: 3, todo: 2, skipped: 1, passed: 0 };

const prevPriority = statusPriority[prevStatus] ?? -1;

const newPriority = statusPriority[newStatus] ?? -1;

newStatus = prevPriority > newPriority ? prevStatus : newStatus;

}

storyAssertions[storyId] = {

status: newStatus,

JReinhold · 2025-12-12T09:24:55Z

eval/lib/evaluations/coverage.ts

+		const coverageData = JSON.parse(
+			await fs.readFile(finalCoveragePath, 'utf8'),
+		);


I prefer the modern dynamic import of JSON files now support in Node, over readFile+parse. It gives nicer errors and provide type safety when the path is static, and then dynamic paths is just about being consistent. There are a bunch of instances of this alraedy in the code base that you can find, but it's something like this, if my memory is correct:

Suggested change

const coverageData = JSON.parse(

await fs.readFile(finalCoveragePath, 'utf8'),

);

const { default: coverageData } = await import(finalCoveragePath, { type: 'json' });

JReinhold · 2025-12-12T09:34:21Z

eval/templates/project/package.json

 		"eval:lint": "eslint .",
 		"preview": "vite preview",
 		"eval:test": "vitest run --reporter json --outputFile ../results/tests.json",
+		"eval:test:coverage": "vitest run --coverage --reporter json --outputFile ../results/tests.json",


nit

Suggested change

"eval:test:coverage": "vitest run --coverage --reporter json --outputFile ../results/tests.json",

"eval:test:coverage": "pnpm run eval:test --coverage",

JReinhold · 2025-12-12T09:39:20Z

eval/eval.ts

+p.log.message(
+	cov
+		? `📊 Coverage: lines ${formatCov(cov.lines)}, statements ${formatCov(cov.statements)}, branches ${formatCov(cov.branches)}, functions ${formatCov(cov.functions)}`
+		: '📊 Coverage: (not collected)',
+);


I think the purpose of these log lines is to get a very quick overview of the result, and I think this is a bit too verbose for that right now.

I'd suggest something at outputs:

Coverage: ✅ 96 % OR Coverage: ⚠️ 80 % OR Coverage: ❌ 60 %

With some high and low watermarks, like: ❌ < 70 % < ⚠️ < 90 % < ✅
(I'm not married to those numbers at all)

I would probably also just not show "(not collected)" at all in that scenario, because it makes me think I as a user configured something wrong - maybe?

Copilot AI review requested due to automatic review settings December 11, 2025 20:54

Copilot started reviewing on behalf of valentinpalkovic December 11, 2025 20:54 View session

Implemented end-to-end coverage reporting for dev evals

421035d

valentinpalkovic force-pushed the valentin/implementing-coverage-for-dev-evals branch from c432181 to 421035d Compare December 11, 2025 20:58

Copilot AI reviewed Dec 11, 2025

View reviewed changes

Update eval/lib/evaluations/coverage.ts

6454526

Co-authored-by: Copilot <[email protected]>

Copilot AI review requested due to automatic review settings December 11, 2025 21:05

Copilot started reviewing on behalf of valentinpalkovic December 11, 2025 21:06 View session

valentinpalkovic and others added 3 commits December 11, 2025 22:07

Update eval/eval.ts

6cf6ae2

Co-authored-by: Copilot <[email protected]>

Cleanup

e9b227a

Adding warnings for failed branch coverage and project coverage calcu…

441ddb3

…lations.

Copilot AI reviewed Dec 11, 2025

View reviewed changes

Format

1d75460

JReinhold requested changes Dec 12, 2025

View reviewed changes

-			storyAssertions[storyId] = {
-				status: assertionResult.status,
+			// Aggregate statuses: if any assertion fails, mark as failed; otherwise, keep the worst status
+			const prevStatus = storyAssertions[storyId]?.status;
+			let newStatus = assertionResult.status;
+			if (prevStatus) {
+				// Priority: failed > todo > skipped > passed
+				const statusPriority = { failed: 3, todo: 2, skipped: 1, passed: 0 };
+				const prevPriority = statusPriority[prevStatus] ?? -1;
+				const newPriority = statusPriority[newStatus] ?? -1;
+				newStatus = prevPriority > newPriority ? prevStatus : newStatus;
+			}
+			storyAssertions[storyId] = {
+				status: newStatus,

	"eval:test:coverage": "vitest run --coverage --reporter json --outputFile ../results/tests.json",
	"eval:test:coverage": "pnpm run eval:test --coverage",

Implemented end-to-end coverage reporting for dev evals #110

Are you sure you want to change the base?

Implemented end-to-end coverage reporting for dev evals #110

Uh oh!

Conversation

valentinpalkovic commented Dec 11, 2025

Uh oh!

changeset-bot bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

pkg-pr-new bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

JReinhold Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

JReinhold Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

JReinhold Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

changeset-bot bot commented Dec 11, 2025 •

edited

Loading

pkg-pr-new bot commented Dec 11, 2025 •

edited

Loading

codecov bot commented Dec 11, 2025 •

edited

Loading