Skip to content

Conversation

@JReinhold
Copy link
Contributor

@JReinhold JReinhold commented Dec 9, 2025

This PR introduces another context type: Storybook MCP - Dev.
When using that, Storybook is set up in the experiment, its dev server is started and the dev server's MCP server is configured for the agent.

The existing Storybook MCP context has been renamed to Storybook MCP - Docs, to signify it's the docs-only (ie "remote") Storybook MCP version.

@changeset-bot
Copy link

changeset-bot bot commented Dec 9, 2025

⚠️ No Changeset found

Latest commit: 88a9cba

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

💥 An error occurred when fetching the changed packages and changesets in this PR
Some errors occurred when validating the changesets config:
The package or glob expression "@storybook/mcp-eval*" is specified in the `ignore` option but it is not found in the project. You may have misspelled the package name or provided an invalid glob expression. Note that glob expressions must be defined according to https://www.npmjs.com/package/micromatch.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Dec 9, 2025

npm i https://pkg.pr.new/storybookjs/mcp/@storybook/addon-mcp@109
npm i https://pkg.pr.new/storybookjs/mcp/@storybook/mcp@109

commit: 88a9cba

@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.82%. Comparing base (4648a4b) to head (88a9cba).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #109   +/-   ##
=======================================
  Coverage   87.82%   87.82%           
=======================================
  Files          19       19           
  Lines         427      427           
  Branches      122      122           
=======================================
  Hits          375      375           
  Misses          8        8           
  Partials       44       44           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 290 to 308
// Check for MCP server status in init message
if (
parsed.type === 'system' &&
parsed.subtype === 'init' &&
parsed.mcp_servers &&
parsed.mcp_servers.length > 0
) {
for (const server of parsed.mcp_servers) {
if (server.status === 'connected') {
clackLog.success(`MCP server "${server.name}" connected`);
} else {
clackLog.error(
`MCP server "${server.name}" failed to connect (status: ${server.status})`,
);
process.exit(1);
}
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a small QoL improvement, logging which MCP server actually successfully connected. If any MCP server is configured but failed to connect, it bails the whole eval early. I doubt that is ever the intention anyway.

Image

@JReinhold JReinhold marked this pull request as ready for review December 9, 2025 12:33
Copilot AI review requested due to automatic review settings December 9, 2025 12:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new "Storybook MCP - Dev" context mode that sets up and runs a local Storybook development server with the MCP addon, while renaming the existing remote manifest-based context to "Storybook MCP - Docs" for clarity. This enables testing agents in a live development environment versus documentation-only scenarios.

Key Changes

  • New storybook-mcp-dev context type that installs Storybook, starts a dev server on a random port, and provides HTTP MCP endpoint
  • Added dev server lifecycle management with automatic startup during experiment preparation and teardown after agent execution
  • Renamed existing Storybook MCP context to "Storybook MCP - Docs" throughout UI, documentation, and code

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
eval/types.ts Adds storybook-mcp-dev context type definition
eval/lib/storybook-dev-server.ts New module for managing Storybook dev server lifecycle (port allocation, process management, readiness detection)
eval/lib/prepare-experiment.ts Enhanced to install Storybook packages and start dev server for the new context, returns MCP config
eval/lib/teardown-experiment.ts New cleanup module to stop dev server after experiment
eval/lib/agents/claude-code-cli.ts Added MCP server connection validation and logging
eval/lib/collect-args.ts Updated CLI parsing and help text to support new context mode
eval/eval.ts Integrates teardown step and uses prepared MCP config from experiment setup
eval/lib/save/google-sheet.ts Handles new context type in reporting
eval/lib/evaluations/prepare-evaluations.ts Adds @storybook/addon-mcp to evaluation dependencies
eval/templates/evaluation/.storybook/main.ts Simplified config, removed getAbsolutePath helper, added @storybook/addon-mcp
eval/README.md Documentation updates for five context modes instead of four
.github/instructions/eval.instructions.md Comprehensive documentation of new context mode and architecture
eval/evals/100-flight-booking-plain/prompt.md Minor formatting fix (extra blank line)
eval/evals/100-flight-booking-plain/explicit-stories.md New file with story-writing reminder

parsed.mcp_servers.length > 0
) {
for (const server of parsed.mcp_servers) {
if (server.status === 'connected') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage, could the MCP server be pending? Could you let us know how we know it has successfully connected at this stage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. I added a comment clarifying what this connection status means.

this is the status as reported by the agent, so this is the source of truth. The agent initialises connections with all servers when the session starts, and if any servers are not ready by this point, they won't be used AFAIK.

};

function getContextDetails(context: Context): string {
switch (context.type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technical improvement:
I personally like to have exhausting type checks for the default fall-through case, like demonstrated here: https://www.typescriptlang.org/play/?#code/KYOwrgtgBAwg9gGzgJygbwFBSgJWAEwBosoBxZYUY7AIQTGAwF8MMAzMEAYwBcBLOCCgBzYD3hJkAOQCGEYAAouALliIUASlUBnHsj4hh6EtoDufHlwAWUJRuPZsXGduBrJAOjz5lJR1AoeMGQhACIKfFCAbj8oZ1d3FA9yShBff2xA4LDhClBo2IB6QqgAMRRhOB4oGQAjODBqugZY-GA2GTAEHnSMrkFdKGAADytO3T4AN2AYK2AuAGtVEGBp1ABeOJiMqB4rZDhTKBWjgFFkA+QFAAMAVRAxkHwEAjj1VHjgVQASNBGxsATaazeYLJjXDTbKAsJhAA

So whenever the union context.type is extended, TypeScript will error to signal that adjustments are necessary here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. However it's already covered without the default case, because the function's return type errors in that scenario too, without a default: https://www.typescriptlang.org/play/?#code/KYOwrgtgBAwg9gGzgJygbwFBSgJWAEwBosoBxZYUY7AIQTGAwF8MMAzMEAYwBcBLOCCgBzYD3hJkAOQCGEYAAouALliIUASlUBnHsj4hh6EtoDufHlwAWUJRuPZsXGduBrJAOjz5lJR1AoeMGQhACIKfFCAbj8oZ1d3FA9yShBff2xA4LDhClBo2IB6QqgAMRRhOB4oGQAjODBqugYSFiYgA

which is also the case for this code specifically.

however I did something that I like even better, I configured the lint rule to handle this:

So we don't have to remember to add the runtime code to ensure switches are exhaustive.

process: Result,
timeout: number = 10_000,
): Promise<void> {
return await new Promise<void>((resolve, reject) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you returning a new Promise inside of an async function, instead of using await inside and throwing errors directly? Is it because of the onExit callback function, which is passed to the off,on event listeners?

Copy link
Contributor Author

@JReinhold JReinhold Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the basic pattern of turning callbacks into promises - onData, onExit and setTimeout. I don't think there's any other way of doing it, except for Promise.withResolvers which I'm fine with too.

Even though the function is async, there's nothing we could await here.

@JReinhold JReinhold changed the title add Storybook Dev context, rename Storybook MCP to Storybook Docs Eval: Add Storybook Dev context, rename Storybook MCP to Storybook Doc Dec 9, 2025
@JReinhold JReinhold changed the title Eval: Add Storybook Dev context, rename Storybook MCP to Storybook Doc Eval: Add Storybook Dev context, rename Storybook MCP to Storybook Docs Dec 9, 2025
Co-authored-by: Copilot <[email protected]>
Copilot AI review requested due to automatic review settings December 9, 2025 19:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@JReinhold JReinhold merged commit c2c7920 into main Dec 10, 2025
11 checks passed
@JReinhold JReinhold deleted the sb-dev-eval branch December 10, 2025 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants