Implement sampling #6

ezynda3 · 2024-12-05T14:38:40Z

The Model Context Protocol (MCP) provides a standardized way for servers to request LLM sampling ("completions" or "generations") from language models via clients. This flow allows clients to maintain control over model access, selection, and permissions while enabling servers to leverage AI capabilities—with no server API keys necessary. Servers can request text or image-based interactions and optionally include context from MCP servers in their prompts.

User Interaction Model

Sampling in MCP allows servers to implement agentic behaviors, by enabling LLM calls to occur nested inside other MCP server features.

Implementations are free to expose sampling through any interface pattern that suits their needs—the protocol itself does not mandate any specific user interaction model.

{{< callout type="warning" >}}
For trust & safety and security, there SHOULD always be a human in the loop with the ability to deny sampling requests.

Applications SHOULD:

Provide UI that makes it easy and intuitive to review sampling requests
Allow users to view and edit prompts before sending
Present generated responses for review before delivery
{{< /callout >}}

Capabilities

Clients that support sampling MUST declare the sampling capability during [initialization]({{< ref "/specification/basic/lifecycle#initialization" >}}):

{
  "capabilities": {
    "sampling": {}
  }
}

Protocol Messages

Creating Messages

To request a language model generation, servers send a sampling/createMessage request:

Request:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "sampling/createMessage",
  "params": {
    "messages": [
      {
        "role": "user",
        "content": {
          "type": "text",
          "text": "What is the capital of France?"
        }
      }
    ],
    "modelPreferences": {
      "hints": [
        {
          "name": "claude-3-sonnet"
        }
      ],
      "intelligencePriority": 0.8,
      "speedPriority": 0.5
    },
    "systemPrompt": "You are a helpful assistant.",
    "maxTokens": 100
  }
}

Response:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "role": "assistant",
    "content": {
      "type": "text",
      "text": "The capital of France is Paris."
    },
    "model": "claude-3-sonnet-20240307",
    "stopReason": "endTurn"
  }
}

Message Flow

sequenceDiagram
    participant Server
    participant Client
    participant User
    participant LLM

    Note over Server,Client: Server initiates sampling
    Server->>Client: sampling/createMessage

    Note over Client,User: Human-in-the-loop review
    Client->>User: Present request for approval
    User-->>Client: Review and approve/modify

    Note over Client,LLM: Model interaction
    Client->>LLM: Forward approved request
    LLM-->>Client: Return generation

    Note over Client,User: Response review
    Client->>User: Present response for approval
    User-->>Client: Review and approve/modify

    Note over Server,Client: Complete request
    Client-->>Server: Return approved response

Data Types

Messages

Sampling messages can contain:

Text Content

{
  "type": "text",
  "text": "The message content"
}

Image Content

{
  "type": "image",
  "data": "base64-encoded-image-data",
  "mimeType": "image/jpeg"
}

Model Preferences

Model selection in MCP requires careful abstraction since servers and clients may use different AI providers with distinct model offerings. A server cannot simply request a specific model by name since the client may not have access to that exact model or may prefer to use a different provider's equivalent model.

To solve this, MCP implements a preference system that combines abstract capability priorities with optional model hints:

Capability Priorities

Servers express their needs through three normalized priority values (0-1):

costPriority: How important is minimizing costs? Higher values prefer cheaper models.
speedPriority: How important is low latency? Higher values prefer faster models.
intelligencePriority: How important are advanced capabilities? Higher values prefer more capable models.

Model Hints

While priorities help select models based on characteristics, hints allow servers to suggest specific models or model families:

Hints are treated as substrings that can match model names flexibly
Multiple hints are evaluated in order of preference
Clients MAY map hints to equivalent models from different providers
Hints are advisory—clients make final model selection

For example:

{
  "hints": [
    {"name": "claude-3-sonnet"},  // Prefer Sonnet-class models
    {"name": "claude"}            // Fall back to any Claude model
  ],
  "costPriority": 0.3,            // Cost is less important
  "speedPriority": 0.8,           // Speed is very important
  "intelligencePriority": 0.5     // Moderate capability needs
}

The client processes these preferences to select an appropriate model from its available options. For instance, if the client doesn't have access to Claude models but has Gemini, it might map the sonnet hint to gemini-1.5-pro based on similar capabilities.

Error Handling

Clients SHOULD return errors for common failure cases:

Example error:

{
  "jsonrpc": "2.0",
  "id": 1,
  "error": {
    "code": -1,
    "message": "User rejected sampling request"
  }
}

Security Considerations

Clients SHOULD implement user approval controls
Both parties SHOULD validate message content
Clients SHOULD respect model preference hints
Clients SHOULD implement rate limiting
Both parties MUST handle sensitive data appropriately

The text was updated successfully, but these errors were encountered:

huhu415 · 2025-03-29T10:41:25Z

I observe that mcp-go already has a SamplingMessage data structure, I wonder if it's available. Could the author give an example to try?
Is there any client that can debug this sampling function, the cluade client is not supported, I wonder what client is supported?

ezynda3 · 2025-03-29T13:50:12Z

The sampling message was generated from the official schema, but the actual functionality has not been implemented.

huhu415 · 2025-03-29T14:58:41Z

Looked at the official document, feel the server side to add this feature is not difficult. It is to request the client, and then the client returns the content.

now there is a difficulty is: server side to realize the Not difficult, the difficulty is that there is no client to support this function, I have no way to debug ah!

huhu415 · 2025-03-29T16:32:52Z

I can see why the author delayed adding this feature, I wrote a little code, looked at the architecture, and realized that this sampling feature, it's hard to add.

I am going to change the architecture of the server.

huhu415 · 2025-03-29T17:13:51Z

This processInputStream function may be changed

Can the author give me some inspiration? What changes are allowed?

I think this thing should be written like this, that is, a request id, a goroutine, simulating the TCP method, and then use a lock to control the output io, but the current architecture is not like this. In this way, the processing method is unified

ezynda3 added the help wanted Extra attention is needed label Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement sampling #6

Implement sampling #6

ezynda3 commented Dec 5, 2024 •

edited

Loading

huhu415 commented Mar 29, 2025

ezynda3 commented Mar 29, 2025

huhu415 commented Mar 29, 2025

huhu415 commented Mar 29, 2025 •

edited

Loading

huhu415 commented Mar 29, 2025

Implement sampling #6

Implement sampling #6

Comments

ezynda3 commented Dec 5, 2024 • edited Loading

User Interaction Model

Capabilities

Protocol Messages

Creating Messages

Message Flow

Data Types

Messages

Text Content

Image Content

Model Preferences

Capability Priorities

Model Hints

Error Handling

Security Considerations

huhu415 commented Mar 29, 2025

ezynda3 commented Mar 29, 2025

huhu415 commented Mar 29, 2025

huhu415 commented Mar 29, 2025 • edited Loading

huhu415 commented Mar 29, 2025

ezynda3 commented Dec 5, 2024 •

edited

Loading

huhu415 commented Mar 29, 2025 •

edited

Loading