Skip to content

fix: wrap MCP server output as JSON when structured logging enabled #713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

devin-ai-integration[bot]
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Jul 4, 2025

Fix MCP server JSON output and integrate telemetry tracking

Summary

This PR addresses the issue where PyAirbyte's MCP server was printing non-JSON output to STDOUT, confusing MCP clients. The changes implement structured logging throughout the MCP server code and integrate comprehensive telemetry tracking as requested in the GitHub comments.

Key changes:

  1. Structured logging: Replaced all print() statements in MCP server with log_mcp_message() calls that respect AIRBYTE_STRUCTURED_LOGGING setting
  2. Environment variable standardization: Moved NO_LIVE_PROGRESS to airbyte.constants with AIRBYTE_NO_LIVE_PROGRESS prefix and backward compatibility fallback
  3. Combined telemetry approach: Implemented both MCP session flagging (via get_env_flags()) and specific tool tracking (via EventType.MCP_TOOL_CALL)

The solution ensures that when AIRBYTE_STRUCTURED_LOGGING=true, all MCP server output on STDOUT is valid JSON, while stderr can contain non-JSON messages.

Review & Testing Checklist for Human

⚠️ High Priority (4 items - requires careful verification):

  • End-to-end MCP testing: Run MCP server with source-faker and verify all STDOUT output is valid JSON when AIRBYTE_STRUCTURED_LOGGING=true
  • Environment variable fallback: Test that AIRBYTE_NO_LIVE_PROGRESS takes precedence, NO_LIVE_PROGRESS is used as fallback, and default is "false"
  • Telemetry integration: Verify that MCP operations send telemetry with both session flagging and tool-specific tracking without breaking existing functionality
  • Rich progress disabling: Confirm that NO_LIVE_PROGRESS=1 actually prevents Rich live progress displays during MCP operations

Recommended test plan:

# Test structured logging output
export AIRBYTE_STRUCTURED_LOGGING=true
export AIRBYTE_NO_LIVE_PROGRESS=true
# Run MCP server with faker source and pipe STDOUT to verify JSON validity
python -m airbyte.mcp.server | jq . # Should parse all lines as valid JSON

Diagram

%%{ init : { "theme" : "default" }}%%
graph TD
    MCP["airbyte/mcp/server.py<br/>MCP Server Entry Point"]:::major-edit
    LocalOps["airbyte/mcp/_local_ops.py<br/>Local Operations"]:::major-edit
    Util["airbyte/mcp/_util.py<br/>MCP Utilities & Logging"]:::major-edit
    Constants["airbyte/constants.py<br/>Environment Constants"]:::major-edit
    Telemetry["airbyte/_util/telemetry.py<br/>Telemetry System"]:::major-edit
    Logs["airbyte/logs.py<br/>Structured Logging Config"]:::context

    MCP --> Util
    LocalOps --> Util
    Util --> Constants
    Util --> Telemetry
    Util --> Logs
    
    MCP -.->|"replaces print() calls"| Util
    LocalOps -.->|"replaces print() calls"| Util
    Constants -.->|"NO_LIVE_PROGRESS config"| Util
    Telemetry -.->|"MCP session flagging"| Util

    subgraph Legend
        L1[Major Edit]:::major-edit
        L2[Minor Edit]:::minor-edit
        L3[Context/No Edit]:::context
    end

    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB
    classDef context fill:#FFFFFF
Loading

Notes

  • Session context: This work was requested by @aaronsteers and addresses GitHub comments on the original PR
  • Testing limitation: Due to environment setup issues (BigQuery cache errors), end-to-end testing was not possible during development
  • Telemetry approach: Implements the combined strategy suggested by @aaronsteers - both session-level flagging and tool-specific event tracking
  • Backward compatibility: Environment variable changes maintain backward compatibility with existing NO_LIVE_PROGRESS usage

Link to Devin session: https://app.devin.ai/sessions/84e33c5b00244873b05e4bd73a4f2504
Requested by: @aaronsteers

- Add get_mcp_logger() and log_mcp_message() functions to airbyte/mcp/_util.py
- Replace all print() calls in MCP server with structured logging calls
- Add NO_LIVE_PROGRESS constant to airbyte/constants.py following _str_to_bool() pattern
- Set NO_LIVE_PROGRESS=1 programmatically to prevent Rich progress interference
- Support both JSON and plain text output based on AIRBYTE_STRUCTURED_LOGGING setting
- Maintain backward compatibility when structured logging is disabled

Co-Authored-By: AJ Steers <[email protected]>
Copy link
Contributor Author

Original prompt from AJ Steers:

@Devin - The new PyAirbyte MCP server is confusing MCP clients with extra info printed to STDOUT. Can you use the constants like STRUCTURED_LOGGING or similar to make sure that whatever we print from the PyAirbyte MCP tools server is wrapped as JSON?

Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

- Update **kwargs type annotation from Any to str | float | bool | None
- Split function signature across multiple lines to fix line length limit
- Remove redundant int type as suggested by PYI041 rule

Co-Authored-By: AJ Steers <[email protected]>
Copy link

github-actions bot commented Jul 5, 2025

PyTest Results (Fast Tests Only, No Creds)

722 tests  ±0   722 ✅ ±0   17m 9s ⏱️ -2s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit ee30db6. ± Comparison against base commit 816fa44.

♻️ This comment has been updated with latest results.

devin-ai-integration bot and others added 2 commits July 5, 2025 00:13
…OL_CALL and session flagging

- Add EventType.MCP_TOOL_CALL to telemetry enum for specific MCP tool tracking
- Create log_mcp_tool_call() function for MCP tool telemetry
- Update log_mcp_message() to send telemetry for MCP tool calls with proper state mapping
- Keep MCP session flag in get_env_flags() to tag all operations when called from MCP
- Fix type compatibility issue between kwargs and log_mcp_tool_call parameters
- Address GitHub comments from @aaronsteers about combined telemetry approach

This implements the dual telemetry strategy:
1. MCP session flagging: All telemetry events (sync, install, check) get MCP flag via get_env_flags()
2. Specific tool tracking: Individual MCP tool calls tracked via EventType.MCP_TOOL_CALL

Co-Authored-By: AJ Steers <[email protected]>
- Update Python version markers for dependencies
- Side effect of running poetry commands during development

Co-Authored-By: AJ Steers <[email protected]>
Copy link

github-actions bot commented Jul 5, 2025

PyTest Results (Full)

784 tests  ±0   770 ✅ ±0   29m 22s ⏱️ - 2m 28s
  1 suites ±0    14 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit ee30db6. ± Comparison against base commit 816fa44.

Copy link
Contributor Author

Closing due to inactivity for more than 7 days. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant