-
Notifications
You must be signed in to change notification settings - Fork 58
feat(mcp): add local dev script tool #711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Implement 5 MCP tools: list_connectors, get_config_spec, create_config_markdown, validate_config, run_sync - Add secret detection to prevent hardcoded credentials in configs - Support stdio transport for MCP client integration - Leverage existing PyAirbyte APIs for connector operations Co-Authored-By: AJ Steers <[email protected]>
- Replace lowlevel Server with FastMCP framework using @app.tool() decorators - Change server name from 'pyairbyte-mcp' to 'airbyte-mcp' as requested - Use existing print_config_spec() method for markdown generation - Add type ignore for FastMCP run() method mypy false positive - All 5 MCP tools working: list_connectors, get_config_spec, create_config_markdown, validate_config, run_sync - Secret detection using JSON schema patterns (writeOnly, format, field names) - Passes ruff formatting, linting, and mypy type checking Co-Authored-By: AJ Steers <[email protected]>
- Remove all type annotations from @app.tool() decorated functions to fix issubclass() TypeError - FastMCP framework requires untyped parameters for proper tool registration - All 5 MCP tools now working: list_connectors, get_config_spec, create_config_markdown, validate_config, run_sync - Secret detection logic works but source-faker has no secret fields to test against - Server successfully initializes and responds to tool calls Co-Authored-By: AJ Steers <[email protected]>
…rrors - Add type annotations to all FastMCP tool functions - Use modern Python 3.10+ type syntax (str | None instead of Optional[str]) - Replace deprecated typing imports (Dict/List) with built-in types (dict/list) - Fix import sorting and organization - All type checking now passes with mypy Co-Authored-By: AJ Steers <[email protected]>
…t parameter - Change default output format from JSON to YAML as requested by aaronsteers - Add output_format parameter to allow caller to override with 'json' if preferred - Fix parameter name shadowing Python builtin 'format' - Remove unnecessary else statement and inline import - Add yaml import at top level for better organization Co-Authored-By: AJ Steers <[email protected]>
…onnectors, remove type annotations for FastMCP compatibility Co-Authored-By: AJ Steers <[email protected]>
…quirements Co-Authored-By: AJ Steers <[email protected]>
… all lint errors - Replace Union[str, None] with str | None syntax per ruff UP007 rule - Remove unused Optional import to fix F401 lint error - Add proper type annotations to all @app.tool() functions - All 15 ruff lint errors now resolved - All 155 unit tests continue to pass - FastMCP framework compatibility maintained Co-Authored-By: AJ Steers <[email protected]>
…FastMCP compatibility - Add examples/run_mcp_server.py with comprehensive MCP tools demonstration - Remove type annotations from @app.tool() functions to fix FastMCP TypeError - Example shows all 5 MCP tools: list_connectors, get_config_spec, validate_config, run_sync - Follows PyAirbyte examples directory conventions with poetry run execution - Includes both interactive demo and server startup functionality Co-Authored-By: AJ Steers <[email protected]>
…patibility FastMCP framework cannot handle type annotations on decorated functions due to introspection limitations. This creates a necessary trade-off between FastMCP compatibility and lint requirements. The MCP server is now fully functional with all 5 tools working correctly. Co-Authored-By: AJ Steers <[email protected]>
…tibility - Add ruff: noqa: ANN001, ANN201 to suppress type annotation lint errors - Add mypy: disable-error-code=no-untyped-def to suppress mypy errors - FastMCP framework cannot handle type annotations on decorated functions - All local lint checks now pass Co-Authored-By: AJ Steers <[email protected]>
…alert - Changed 'field_value in os.environ' to 'os.environ.get(field_value) is not None' - Maintains identical functionality while avoiding CodeQL security warning - Follows existing PyAirbyte patterns for safe environment variable handling Co-Authored-By: AJ Steers <[email protected]>
Co-Authored-By: AJ Steers <[email protected]>
- Demonstrates PydanticAI agent connecting to PyAirbyte MCP server via stdio - Uses GitHub Models OpenAI-compatible endpoint for LLM inference - Syncs Pokemon data for Pikachu, Charizard, and Bulbasaur from PokeAPI - Handles intentional misspelling correction (Bulbsaur → Bulbasaur) - Validates configurations and syncs to default DuckDB cache - Provides comprehensive example of MCP tool usage with AI agent Co-Authored-By: AJ Steers <[email protected]>
…ibility issues - Replace manual stdio stream management with FastMCP's run_stdio_async() method - Fixes 'Unknown transport: MemoryObjectReceiveStream' error - Resolves 'Already running asyncio in this thread' runtime error - Enables successful MCP CLI communication with PyAirbyte server - Tested with postgres connector queries via devin-mcp-cli Co-Authored-By: AJ Steers <[email protected]>
- Clean up unused import after switching to run_stdio_async() - Apply ruff formatting to examples file Co-Authored-By: AJ Steers <[email protected]>
…ected' error - Add source.select_all_streams() call before source.read() in run_sync function - Resolves PyAirbyteNoStreamsSelectedError when running sync operations - Tested with source-pokeapi connector successfully Co-Authored-By: AJ Steers <[email protected]>
- Add six ^1.17.0 dependency to pyproject.toml - Update poetry.lock with new dependency resolution - Resolves ModuleNotFoundError: No module named 'six' when starting MCP server Co-Authored-By: AJ Steers <[email protected]>
…ration (#692) Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: AJ Steers <[email protected]>
- create_stream_template: Create/modify stream templates in manifests - create_auth_logic: Create/modify authentication logic for streams - test_auth_logic: Test auth without hitting endpoints or with specific endpoints - create_stream_from_template: Create streams from existing templates - test_stream_read: Test stream reads using CDK test-read functionality - Add Rick & Morty API connector example for testing (no auth required) - Add comprehensive test scripts for all new MCP actions - Update documentation with new tool descriptions and usage examples All tools follow connector context pattern with connector_name parameter. Supports both YAML and JSON output formats for flexibility. Co-Authored-By: AJ Steers <[email protected]>
… schema - Retrieves declarative_component_schema.yaml from CDK package using pkgutil - Supports both YAML and JSON output formats with format parameter - Handles datetime serialization for JSON conversion - Enables LLMs to access manifest validation schema for faster local iteration - Schema contains 100+ definitions for DeclarativeSource components Co-Authored-By: AJ Steers <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…/PyAirbyte into aj/feat/add-mcp-server
- Create new _local_dev.py module for development tools - Add template-based pipeline generation without OpenAI dependency - Generate complete PyAirbyte pipeline scripts with setup instructions - Register new development tools in MCP server Co-Authored-By: AJ Steers <[email protected]>
- Fix line length violations by using shorter variable names - Fix quote style issues (single to double quotes) - Fix whitespace on blank line - All local linting checks now pass Co-Authored-By: AJ Steers <[email protected]>
📝 WalkthroughWalkthroughThis update introduces new modules and functionality for local development operations in the MCP server, specifically adding coding tool registration and connector configuration tools. It provides a template-based pipeline generator for PyAirbyte, integrates these tools into the FastMCP app, and includes documentation templates and placeholders for future connector config operations. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant FastMCP_App
participant CodingTools
participant ConnectorConfigTools
User->>FastMCP_App: Start server
FastMCP_App->>CodingTools: register_coding_tools()
FastMCP_App->>ConnectorConfigTools: register_connector_config_tools()
User->>FastMCP_App: Request pipeline generation
FastMCP_App->>CodingTools: generate_pyairbyte_pipeline()
CodingTools-->>FastMCP_App: Return pipeline script & docs
FastMCP_App-->>User: Deliver generated pipeline and instructions
Would you like to see a more detailed breakdown of the pipeline generation flow as well, wdyt? ✨ Finishing Touches
🧪 Generate Unit Tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (9)
airbyte/secrets/util.py (1)
15-29
: Useful utility function with minor style improvement opportunity.The function correctly checks secret availability by calling
get_secret
withallow_prompt=False
. Would you consider simplifying the control flow by removing the unnecessaryelse
clause, wdyt?def is_secret_available( secret_name: str, ) -> bool: """Check if a secret is available in any of the configured secret sources. This function checks all available secret sources for the given secret name. If the secret is found in any source, it returns `True`; otherwise, it returns `False`. """ try: _ = get_secret(secret_name, allow_prompt=False) - except exc.PyAirbyteSecretNotFoundError: - return False - else: - # If no exception was raised, the secret was found. - return True + return True + except exc.PyAirbyteSecretNotFoundError: + return Falseairbyte/mcp/server.py (1)
25-37
: Good error handling with a minor improvement opportunity.The function handles both
KeyboardInterrupt
and general exceptions appropriately. Would you consider being more specific about which exceptions to catch in the general case, or perhaps logging the full traceback for debugging purposes, wdyt?def main() -> None: """Main entry point for the MCP server.""" print("Starting Airbyte MCP server.", file=sys.stderr) try: asyncio.run(app.run_stdio_async()) except KeyboardInterrupt: print("Airbyte MCP server interrupted by user.", file=sys.stderr) - except Exception as ex: - print(f"Error running Airbyte MCP server: {ex}", file=sys.stderr) + except Exception as ex: + import traceback + print(f"Error running Airbyte MCP server: {ex}", file=sys.stderr) + traceback.print_exc(file=sys.stderr) sys.exit(1) print("Airbyte MCP server stopped.", file=sys.stderr)airbyte/mcp/_local_dev.py (2)
47-52
: Consider consistent return structure for error cases.The error returns use
{"error": "..."}
structure while success returns{"code": "...", "instructions": "...", "filename": "..."}
. For better API consistency, should we standardize on a common response structure that includes status indicators, wdyt?Also applies to: 55-60
64-66
: Empty config dictionaries might be confusing in generated code.The template uses empty dictionaries
{}
as placeholders for source and destination configs. Users might not realize they need to fill these in. Could we add comments in the template or use more descriptive placeholders like{"config_key": "config_value"}
, wdyt?airbyte/mcp/_cloud_ops.py (2)
28-37
: Consider making api_root parameter optional with default.The
api_root
parameter is annotated as optional (str | None
) but seems to always be used with a fallback toCLOUD_API_ROOT
. Should we make it truly optional by defaulting toNone
in the function signature, wdyt?def get_cloud_sync_status( workspace_id: Annotated[str, Field(description="The ID of the Airbyte Cloud workspace.")], connection_id: Annotated[str, Field(description="The ID of the Airbyte Cloud connection.")], - api_root: Annotated[str | None, Field(description="Optional Cloud API root URL override.")], + api_root: Annotated[str | None, Field(description="Optional Cloud API root URL override.")] = None, job_id: Annotated[int | None, Field(description="Optional job ID. If not provided, the latest job will be used.")] = None, ) -> JobStatusEnum | None:
39-43
: Enhance docstring to explain secret fallback behavior.The docstring mentions environment variables but doesn't explain the full secret resolution process. Could we clarify that it will use any configured secret managers first, then fall back to environment variables, wdyt?
airbyte/mcp/_local_code_templates.py (1)
23-23
: Long lines in template stringsLines 23 and 32 exceed the 100 character limit. Since these are within template strings that will be written to files, would it make sense to keep them as-is to maintain the intended formatting in the generated scripts? Alternatively, we could break them into multiple lines, but that might affect the output formatting. wdyt?
Also applies to: 32-32
airbyte/secrets/hydration.py (1)
17-17
: Consider making the global mask URL configurable?The URL is currently hardcoded. Would it be beneficial to allow this to be configured via an environment variable for flexibility in different environments? wdyt?
airbyte/mcp/_local_ops.py (1)
87-111
: Consider extensibility for other secret managersThe function currently only handles GoogleGSMSecretManager. Would it make sense to add a comment indicating that support for other secret managers could be added here in the future, or perhaps create a more generic interface? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (22)
.ruff.toml
(0 hunks)airbyte/__init__.py
(2 hunks)airbyte/_connector_base.py
(5 hunks)airbyte/cloud/workspaces.py
(2 hunks)airbyte/constants.py
(1 hunks)airbyte/destinations/base.py
(1 hunks)airbyte/mcp/__init__.py
(1 hunks)airbyte/mcp/_cloud_ops.py
(1 hunks)airbyte/mcp/_connector_registry.py
(1 hunks)airbyte/mcp/_local_code_templates.py
(1 hunks)airbyte/mcp/_local_dev.py
(1 hunks)airbyte/mcp/_local_ops.py
(1 hunks)airbyte/mcp/_util.py
(1 hunks)airbyte/mcp/server.py
(1 hunks)airbyte/records.py
(1 hunks)airbyte/secrets/__init__.py
(1 hunks)airbyte/secrets/hydration.py
(1 hunks)airbyte/secrets/util.py
(2 hunks)airbyte/sources/base.py
(2 hunks)airbyte/sources/registry.py
(4 hunks)pyproject.toml
(3 hunks)tests/integration_tests/test_all_cache_types.py
(2 hunks)
💤 Files with no reviewable changes (1)
- .ruff.toml
🧰 Additional context used
🧠 Learnings (6)
tests/integration_tests/test_all_cache_types.py (8)
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#415
File: examples/run_perf_test_reads.py:117-127
Timestamp: 2024-10-09T19:21:45.994Z
Learning: In `examples/run_perf_test_reads.py`, the code for setting up Snowflake configuration in `get_cache` and `get_destination` cannot be refactored into a shared helper function because there are differences between them.
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#347
File: tests/integration_tests/fixtures/registry.json:48-48
Timestamp: 2024-08-31T01:20:08.405Z
Learning: Test fixtures in the PyAirbyte project do not need to align with real Docker repositories.
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#347
File: tests/integration_tests/fixtures/registry.json:48-48
Timestamp: 2024-10-08T15:34:31.026Z
Learning: Test fixtures in the PyAirbyte project do not need to align with real Docker repositories.
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#411
File: airbyte/sources/util.py:121-123
Timestamp: 2024-10-06T17:35:27.056Z
Learning: In `airbyte/sources/util.py`, the function `get_benchmark_source` is intended for benchmarking purposes in the current iteration. Future plans may include adding more generic functions like `get_dummy_source` or `get_mock_source` with broader functionality.
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#411
File: airbyte/sources/util.py:121-123
Timestamp: 2024-10-08T15:34:31.026Z
Learning: In `airbyte/sources/util.py`, the function `get_benchmark_source` is intended for benchmarking purposes in the current iteration. Future plans may include adding more generic functions like `get_dummy_source` or `get_mock_source` with broader functionality.
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#347
File: tests/integration_tests/fixtures/registry.json:47-47
Timestamp: 2024-10-08T15:34:31.026Z
Learning: When reviewing changes in test fixtures, especially renaming, consider that they might be due to fixing copy-paste errors and may not impact core codepaths.
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#347
File: tests/integration_tests/fixtures/registry.json:47-47
Timestamp: 2024-08-31T00:58:32.484Z
Learning: When reviewing changes in test fixtures, especially renaming, consider that they might be due to fixing copy-paste errors and may not impact core codepaths.
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#417
File: airbyte/cli.py:503-504
Timestamp: 2024-10-10T16:17:57.989Z
Learning: In the PyAirbyte project, support for Python versions earlier than 3.10 is not necessary, as the project requires Python 3.10 or newer.
airbyte/destinations/base.py (2)
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#411
File: airbyte/cli.py:111-160
Timestamp: 2024-10-06T23:44:31.534Z
Learning: In PyAirbyte, error messages in functions like `_resolve_source_job` in `airbyte/cli.py` are designed to decouple the message text from dynamic values, following a structlog-inspired design. Dynamic values are provided via parameters like `input_value`. This approach helps avoid including PII in the message strings, which may be used in telemetry.
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#411
File: airbyte/cli.py:111-160
Timestamp: 2024-10-08T15:34:31.026Z
Learning: In PyAirbyte, error messages in functions like `_resolve_source_job` in `airbyte/cli.py` are designed to decouple the message text from dynamic values, following a structlog-inspired design. Dynamic values are provided via parameters like `input_value`. This approach helps avoid including PII in the message strings, which may be used in telemetry.
airbyte/sources/base.py (3)
Learnt from: Suraj-Vishwakarma70
PR: airbytehq/PyAirbyte#285
File: airbyte/sources/base.py:0-0
Timestamp: 2024-07-09T06:42:41.304Z
Learning: Ensure consistent naming for attributes in the `Source` class in `airbyte/sources/base.py`, such as renaming `_to_be_selected_stream` to `_to_be_selected_streams`.
Learnt from: Suraj-Vishwakarma70
PR: airbytehq/PyAirbyte#285
File: airbyte/sources/base.py:0-0
Timestamp: 2024-10-08T15:34:31.026Z
Learning: Ensure consistent naming for attributes in the `Source` class in `airbyte/sources/base.py`, such as renaming `_to_be_selected_stream` to `_to_be_selected_streams`.
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#415
File: airbyte/cli.py:0-0
Timestamp: 2024-10-09T21:11:11.706Z
Learning: In `_resolve_source_job`, returning `None` when `config` is falsy distinguishes between empty config and not-yet-provided config, allowing `_resolve_config()` to handle non-null inputs effectively.
pyproject.toml (7)
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#417
File: airbyte/cli.py:503-504
Timestamp: 2024-10-10T16:17:57.989Z
Learning: In the PyAirbyte project, support for Python versions earlier than 3.10 is not necessary, as the project requires Python 3.10 or newer.
Learnt from: Suraj-Vishwakarma70
PR: airbytehq/PyAirbyte#285
File: airbyte/sources/base.py:0-0
Timestamp: 2024-07-09T06:37:29.133Z
Learning: In the PyAirbyte project, print statements are preferred over logging for consistency.
Learnt from: Suraj-Vishwakarma70
PR: airbytehq/PyAirbyte#285
File: airbyte/sources/base.py:0-0
Timestamp: 2024-10-08T15:34:31.026Z
Learning: In the PyAirbyte project, print statements are preferred over logging for consistency.
Learnt from: Suraj-Vishwakarma70
PR: airbytehq/PyAirbyte#285
File: airbyte/sources/base.py:104-113
Timestamp: 2024-10-08T15:34:31.026Z
Learning: In the PyAirbyte project, print statements are preferred over logging for consistency.
Learnt from: Suraj-Vishwakarma70
PR: airbytehq/PyAirbyte#285
File: airbyte/sources/base.py:0-0
Timestamp: 2024-10-18T07:00:43.413Z
Learning: In the PyAirbyte project, print statements are preferred over logging for consistency.
Learnt from: Suraj-Vishwakarma70
PR: airbytehq/PyAirbyte#285
File: airbyte/sources/base.py:0-0
Timestamp: 2024-07-09T06:37:48.088Z
Learning: In the PyAirbyte project, print statements are preferred over logging for consistency.
Learnt from: Suraj-Vishwakarma70
PR: airbytehq/PyAirbyte#285
File: airbyte/sources/base.py:104-113
Timestamp: 2024-07-09T06:38:54.843Z
Learning: In the PyAirbyte project, print statements are preferred over logging for consistency.
airbyte/records.py (2)
Learnt from: Suraj-Vishwakarma70
PR: airbytehq/PyAirbyte#285
File: airbyte/sources/base.py:0-0
Timestamp: 2024-07-09T06:42:41.304Z
Learning: Ensure consistent naming for attributes in the `Source` class in `airbyte/sources/base.py`, such as renaming `_to_be_selected_stream` to `_to_be_selected_streams`.
Learnt from: Suraj-Vishwakarma70
PR: airbytehq/PyAirbyte#285
File: airbyte/sources/base.py:0-0
Timestamp: 2024-10-08T15:34:31.026Z
Learning: Ensure consistent naming for attributes in the `Source` class in `airbyte/sources/base.py`, such as renaming `_to_be_selected_stream` to `_to_be_selected_streams`.
airbyte/sources/registry.py (2)
Learnt from: Suraj-Vishwakarma70
PR: airbytehq/PyAirbyte#285
File: airbyte/sources/base.py:0-0
Timestamp: 2024-07-09T06:42:41.304Z
Learning: Ensure consistent naming for attributes in the `Source` class in `airbyte/sources/base.py`, such as renaming `_to_be_selected_stream` to `_to_be_selected_streams`.
Learnt from: Suraj-Vishwakarma70
PR: airbytehq/PyAirbyte#285
File: airbyte/sources/base.py:0-0
Timestamp: 2024-10-08T15:34:31.026Z
Learning: Ensure consistent naming for attributes in the `Source` class in `airbyte/sources/base.py`, such as renaming `_to_be_selected_stream` to `_to_be_selected_streams`.
🧬 Code Graph Analysis (11)
tests/integration_tests/test_all_cache_types.py (3)
airbyte/sources/util.py (1)
get_source
(47-123)tests/integration_tests/test_source_faker_integration.py (1)
source_faker_seed_a
(57-68)airbyte/_connector_base.py (1)
get_config
(131-143)
airbyte/destinations/base.py (1)
airbyte/_connector_base.py (1)
_hydrated_config
(146-151)
airbyte/sources/base.py (2)
airbyte/_util/temp_files.py (1)
as_temp_files
(23-66)airbyte/_connector_base.py (1)
_hydrated_config
(146-151)
airbyte/mcp/server.py (5)
airbyte/mcp/_cloud_ops.py (1)
register_cloud_ops_tools
(59-61)airbyte/mcp/_connector_registry.py (1)
register_connector_registry_tools
(126-129)airbyte/mcp/_local_dev.py (1)
register_local_dev_tools
(84-86)airbyte/mcp/_local_ops.py (1)
register_local_ops_tools
(279-293)airbyte/mcp/_util.py (1)
initialize_secrets
(29-43)
airbyte/records.py (1)
airbyte/_util/name_normalizers.py (2)
normalize
(23-25)normalize
(53-87)
airbyte/cloud/workspaces.py (1)
airbyte/_connector_base.py (1)
_hydrated_config
(146-151)
airbyte/mcp/_cloud_ops.py (4)
airbyte/cloud/workspaces.py (1)
CloudWorkspace
(56-444)airbyte/secrets/util.py (1)
get_secret
(32-109)airbyte/cloud/sync_results.py (2)
SyncResult
(129-353)get_job_status
(179-181)airbyte/cloud/connections.py (1)
get_sync_result
(198-223)
airbyte/_connector_base.py (4)
airbyte/secrets/hydration.py (1)
hydrate_secrets
(40-55)airbyte/exceptions.py (1)
AirbyteConnectorConfigurationMissingError
(257-260)airbyte/_util/hashing.py (1)
one_way_hash
(14-35)airbyte/_util/temp_files.py (1)
as_temp_files
(23-66)
airbyte/secrets/hydration.py (2)
airbyte/exceptions.py (1)
PyAirbyteInternalError
(190-194)airbyte/secrets/util.py (1)
get_secret
(32-109)
airbyte/mcp/_connector_registry.py (2)
airbyte/sources/registry.py (3)
get_available_connectors
(236-293)ConnectorMetadata
(72-103)get_connector_metadata
(206-233)airbyte/_connector_base.py (4)
install
(359-362)config_spec
(235-244)name
(83-85)docs_url
(295-300)
airbyte/mcp/_local_dev.py (1)
airbyte/sources/registry.py (1)
get_available_connectors
(236-293)
🪛 Pylint (3.3.7)
airbyte/secrets/util.py
[refactor] 23-29: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
airbyte/sources/registry.py
[refactor] 72-72: Too few public methods (1/2)
(R0903)
airbyte/mcp/_connector_registry.py
[refactor] 84-84: Too few public methods (0/2)
(R0903)
airbyte/mcp/_local_ops.py
[refactor] 199-222: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
🪛 Ruff (0.11.9)
airbyte/mcp/_local_code_templates.py
23-23: Line too long (109 > 100)
(E501)
32-32: Line too long (105 > 100)
(E501)
🔇 Additional comments (44)
airbyte/__init__.py (1)
158-158
: LGTM! Clean integration of the MCP module.The addition follows the established pattern for exposing submodules in the public API. Nice and consistent!
Also applies to: 179-179
tests/integration_tests/test_all_cache_types.py (2)
20-23
: Clean import organization.The explicit imports and reorganization improve readability. Good refactoring!
144-144
: Excellent migration to public API.Switching from the private
_config
attribute to the publicget_config()
method is the right approach. This aligns well with the secret hydration changes throughout the codebase, wdyt?airbyte/destinations/base.py (1)
267-267
: Perfect use of hydrated config for connector execution.Using
_hydrated_config
instead of_config
ensures that secret references are properly resolved before being passed to the destination connector. This is exactly what we want for secure operations, wdyt?pyproject.toml (2)
78-78
: Good use of extras for development dependency.Adding
pydantic-ai
with themcp
extra as a dev dependency is a clean approach for MCP-related development tooling.
178-178
: CLI script entry point follows established patterns.The
airbyte-mcp
script entry point is consistent with the existingpyairbyte
andpyab
scripts. Nice consistency!airbyte/constants.py (1)
128-143
: Excellent documentation for the secret hydration feature.The
SECRETS_HYDRATION_PREFIX
constant is well-defined with comprehensive documentation. The JSON example makes it crystal clear how to use secret references in configuration. The docstring follows good practices by referencing the relevant module for more details, wdyt?airbyte/sources/base.py (2)
295-295
: LGTM! Security improvement by using hydrated config for discovery.This change ensures that secret references are properly resolved before passing the config to the connector subprocess during discovery operations. This aligns well with the new secret hydration architecture introduced in the PR.
641-641
: LGTM! Consistent use of hydrated config for read operations.Similar to the discovery method, this ensures that secret references are resolved before execution. This maintains consistency across all connector operations and improves security by preventing raw secret references from being passed to subprocesses.
airbyte/sources/registry.py (3)
15-15
: Great migration to Pydantic BaseModel!The switch from dataclass to Pydantic provides better data validation, serialization, and integration with the MCP tooling. This is a solid architectural improvement, wdyt?
Also applies to: 72-72
90-92
: Nice addition of suggested streams metadata.The new
suggested_streams
field will be helpful for MCP tools to provide better guidance to users about which streams to select. The optional typing with default None is appropriate here.
157-157
: Correct extraction of nested suggested streams.The extraction logic properly handles the nested
"suggestedStreams" -> "streams"
path from the registry entry. The use of.get()
with fallback to None is safe and appropriate for optional data.airbyte/secrets/util.py (1)
49-53
: Excellent secret reference handling implementation.The logic properly strips the
SECRETS_HYDRATION_PREFIX
and handles the secret reference format. This integrates well with the broader secret hydration architecture introduced in this PR.airbyte/records.py (1)
151-155
: Excellent defensive programming improvement!The switch from direct dictionary access to
.get()
with a fallback prevents potentialKeyError
exceptions when keys are missing from the pretty case lookup. This makes the method more robust and aligns with the updated docstring, wdyt?airbyte/mcp/server.py (1)
16-22
: Well-structured MCP server initialization!The module-level initialization of secrets and tool registration is clean and follows a logical order. The separation of tools into different modules promotes good code organization.
airbyte/secrets/__init__.py (1)
59-83
: Excellent documentation of the new secret reference feature!The documentation clearly explains the secret reference mechanism and provides a helpful JSON example. One minor suggestion - would it be worth mentioning that secret references are processed recursively in nested configurations, wdyt? This could help users understand that the feature works with complex nested config structures.
airbyte/cloud/workspaces.py (2)
160-160
: Good alignment with the secret hydration architecture.Using
_hydrated_config
ensures that secrets are properly resolved before deployment to the cloud. The lint suppression is appropriate here since this is an intentional use of the internal API designed for this purpose.
208-208
: Consistent pattern for destination deployment.Same good pattern as the source deployment - using the hydrated config ensures secrets are resolved for cloud deployment operations.
airbyte/_connector_base.py (6)
127-127
: Smart approach to validate hydrated config during setup.Validating the hydrated config ensures that secret references are resolved and valid before storing the raw config. This catches secret resolution issues early, wdyt?
132-143
: Well-documented distinction between raw and hydrated config.The documentation clearly explains that
get_config()
returns the raw config without secrets hydrated. This preserves the external API contract while enabling internal secret hydration.
145-151
: Clean internal API for hydrated configuration.The
_hydrated_config
property provides a clean interface for internal operations that need resolved secrets. The property pattern ensures lazy evaluation and consistency.
163-163
: Good choice to hash hydrated config.Using hydrated config for hashing ensures that the hash represents the actual configuration that will be used by the connector, including resolved secrets. This makes the hash more meaningful for caching and comparison purposes.
175-175
: Consistent pattern for validation with secret hydration.The validation logic correctly handles both explicit config parameters and stored config, ensuring secrets are hydrated in both cases before validation.
323-323
: Appropriate use of hydrated config for connector execution.Using
_hydrated_config
for the check operation ensures the connector receives fully resolved configuration with secrets available.airbyte/mcp/_local_dev.py (1)
84-86
: Clean registration pattern for MCP tools.The registration function follows a clean pattern for integrating with FastMCP. This looks good for modularity and testing.
airbyte/mcp/_cloud_ops.py (2)
54-56
: Clean and concise sync status retrieval logic.The logic for getting sync results and returning job status is well-structured and handles the optional job_id parameter correctly.
59-61
: Consistent registration pattern with other MCP modules.The registration function follows the same clean pattern as the local dev tools, maintaining consistency across the MCP module architecture.
airbyte/mcp/__init__.py (1)
174-180
: Clean module structure!The module properly imports and exports the server submodule with clear documentation.
airbyte/mcp/_local_code_templates.py (2)
1-1
: Copyright year appears to be in the future?The copyright year shows 2025. Should this be 2024 to match the current timeframe? wdyt?
178-190
: Well-structured template generation functionThe
get_full_script_text
function provides a clean interface for generating scripts with proper parameter substitution.airbyte/mcp/_util.py (2)
19-27
: Robust dotenv loading with proper error handlingGood implementation with type flexibility and clear error messaging.
46-106
: Comprehensive config resolution with security considerationsExcellent implementation that:
- Handles multiple config sources (dict, Path, secret name)
- Detects and rejects hardcoded secrets
- Provides helpful error messages with guidance on using environment variables
- Properly merges configurations using deep_update
airbyte/secrets/hydration.py (2)
20-55
: Well-implemented recursive hydration logicThe hydration functions properly handle nested structures and create deep copies to avoid mutating the original config.
140-171
: Excellent secret detection implementationThe function comprehensively detects hardcoded secrets by checking against both global and connector-specific masks, encouraging the use of secret references.
airbyte/mcp/_connector_registry.py (4)
1-1
: Inconsistent copyright yearThis file shows 2024 while other new files show 2025. Should these be consistent? wdyt?
84-91
: Pylint warning is a false positiveThe warning about too few public methods can be safely ignored -
ConnectorInfo
is a Pydantic data model that doesn't require additional methods.
108-116
: Smart use of contextlib.suppress for resilient metadata retrievalGood defensive programming - the function continues gracefully even if metadata or config spec retrieval fails.
18-81
: Well-designed connector filtering logicThe
list_connectors
function provides flexible filtering with clear logic flow - filtering by install types, then keyword, then connector type. The documentation is comprehensive and helpful.airbyte/mcp/_local_ops.py (6)
1-43
: Clear and comprehensive configuration documentation!The CONFIG_HELP provides excellent guidance on how to use configuration options, including secret references and layering configs. This will be very helpful for users.
45-85
: Well-structured validation function with proper error handling!The function cleanly separates concerns: connector loading, config resolution, and validation check. The tuple return pattern (is_valid, message) is clear and useful.
113-142
: Clean stream listing implementation!Good job on keeping this function focused and simple. The configuration resolution and stream retrieval flow is clear.
144-172
: Good schema retrieval implementation!The function follows the same clean pattern as the others, with proper config resolution before accessing stream metadata.
225-277
: Excellent sync implementation with smart defaults!I really like the "suggested" streams feature with graceful fallback to all streams. The flexible stream parameter handling (string or list) is also user-friendly.
279-294
: Clean tool registration logic!Good separation between the tool that doesn't need CONFIG_HELP (list_connector_config_secrets) and those that do. The description concatenation approach is elegant.
airbyte/mcp/_cloud_ops.py
Outdated
client_id=secrets.get_secret("AIRBYTE_CLIENT_ID"), | ||
client_secret=secrets.get_secret("AIRBYTE_CLIENT_SECRET"), | ||
api_root=api_root or CLOUD_API_ROOT, # Defaults to the Airbyte Cloud API root if None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding error handling for secret retrieval.
The secret retrieval for client credentials could fail if the secrets are not available. Should we add try-catch blocks to provide more helpful error messages when authentication secrets are missing, wdyt?
+ try:
client_id=secrets.get_secret("AIRBYTE_CLIENT_ID"),
client_secret=secrets.get_secret("AIRBYTE_CLIENT_SECRET"),
+ except Exception as e:
+ raise ValueError(f"Failed to retrieve Airbyte Cloud credentials: {e}") from e
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In airbyte/mcp/_cloud_ops.py around lines 48 to 50, the calls to
secrets.get_secret for client_id and client_secret may fail if the secrets are
missing. Add try-except blocks around these secret retrieval calls to catch
exceptions and raise or log more informative error messages indicating which
secret is missing or failed to load, improving error clarity during
authentication setup.
airbyte/secrets/hydration.py
Outdated
response = requests.get( | ||
GLOBAL_MASK_KEYS_URL, | ||
allow_redirects=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add timeout to HTTP request?
The request doesn't specify a timeout, which could cause the application to hang if the endpoint is unresponsive. Consider adding a reasonable timeout value? wdyt?
response = requests.get(
GLOBAL_MASK_KEYS_URL,
allow_redirects=True,
+ timeout=30, # 30 seconds timeout
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
response = requests.get( | |
GLOBAL_MASK_KEYS_URL, | |
allow_redirects=True, | |
) | |
response = requests.get( | |
GLOBAL_MASK_KEYS_URL, | |
allow_redirects=True, | |
timeout=30, # 30 seconds timeout | |
) |
🤖 Prompt for AI Agents
In airbyte/secrets/hydration.py around lines 100 to 103, the requests.get call
lacks a timeout parameter, which can cause the application to hang if the
endpoint is unresponsive. Add a timeout argument with a reasonable value (e.g.,
timeout=10) to the requests.get call to prevent indefinite waiting.
airbyte/mcp/_local_ops.py
Outdated
# Next we load a limited number of records from the generator into our list. | ||
records: list[dict[str, Any]] = list(islice(record_generator, max_records)) | ||
|
||
print(f"Retrieved {len(records)} records from stream '{stream_name}'", sys.stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix stderr print syntax
The print statement is missing the file=
parameter for stderr output. wdyt about updating it?
- print(f"Retrieved {len(records)} records from stream '{stream_name}'", sys.stderr)
+ print(f"Retrieved {len(records)} records from stream '{stream_name}'", file=sys.stderr)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
print(f"Retrieved {len(records)} records from stream '{stream_name}'", sys.stderr) | |
print(f"Retrieved {len(records)} records from stream '{stream_name}'", file=sys.stderr) |
🤖 Prompt for AI Agents
In airbyte/mcp/_local_ops.py at line 212, the print statement intended to output
to stderr is missing the 'file=' parameter. Update the print call to include
'file=sys.stderr' to correctly direct the output to the standard error stream.
airbyte/mcp/_local_ops.py
Outdated
) -> list[dict[str, Any]] | str: | ||
"""Get records from a source connector.""" | ||
try: | ||
source = get_source(source_connector_name) | ||
config_dict = resolve_config( | ||
config=config, | ||
config_secret_name=config_secret_name, | ||
config_spec_jsonschema=source.config_spec, | ||
) | ||
source.set_config(config_dict) | ||
# First we get a generator for the records in the specified stream. | ||
record_generator = source.get_records(stream_name) | ||
# Next we load a limited number of records from the generator into our list. | ||
records: list[dict[str, Any]] = list(islice(record_generator, max_records)) | ||
|
||
print(f"Retrieved {len(records)} records from stream '{stream_name}'", sys.stderr) | ||
|
||
except Exception as ex: | ||
tb_str = traceback.format_exc() | ||
# If any error occurs, we print the error message to stderr and return an empty list. | ||
return ( | ||
f"Error reading records from source '{source_connector_name}': {ex!r}, {ex!s}\n{tb_str}" | ||
) | ||
|
||
else: | ||
return records | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider consistent return types and remove unnecessary else
Two suggestions here:
- The function returns either
list[dict[str, Any]]
orstr
, which could make it harder for callers to handle. Would it be better to raise an exception or return a typed result object? wdyt? - The static analysis correctly identifies that the
else
block is unnecessary after thereturn
statement.
except Exception as ex:
tb_str = traceback.format_exc()
# If any error occurs, we print the error message to stderr and return an empty list.
return (
f"Error reading records from source '{source_connector_name}': {ex!r}, {ex!s}\n{tb_str}"
)
- else:
- return records
+ return records
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
) -> list[dict[str, Any]] | str: | |
"""Get records from a source connector.""" | |
try: | |
source = get_source(source_connector_name) | |
config_dict = resolve_config( | |
config=config, | |
config_secret_name=config_secret_name, | |
config_spec_jsonschema=source.config_spec, | |
) | |
source.set_config(config_dict) | |
# First we get a generator for the records in the specified stream. | |
record_generator = source.get_records(stream_name) | |
# Next we load a limited number of records from the generator into our list. | |
records: list[dict[str, Any]] = list(islice(record_generator, max_records)) | |
print(f"Retrieved {len(records)} records from stream '{stream_name}'", sys.stderr) | |
except Exception as ex: | |
tb_str = traceback.format_exc() | |
# If any error occurs, we print the error message to stderr and return an empty list. | |
return ( | |
f"Error reading records from source '{source_connector_name}': {ex!r}, {ex!s}\n{tb_str}" | |
) | |
else: | |
return records | |
) -> list[dict[str, Any]] | str: | |
"""Get records from a source connector.""" | |
try: | |
source = get_source(source_connector_name) | |
config_dict = resolve_config( | |
config=config, | |
config_secret_name=config_secret_name, | |
config_spec_jsonschema=source.config_spec, | |
) | |
source.set_config(config_dict) | |
# First we get a generator for the records in the specified stream. | |
record_generator = source.get_records(stream_name) | |
# Next we load a limited number of records from the generator into our list. | |
records: list[dict[str, Any]] = list(islice(record_generator, max_records)) | |
print(f"Retrieved {len(records)} records from stream '{stream_name}'", sys.stderr) | |
except Exception as ex: | |
tb_str = traceback.format_exc() | |
# If any error occurs, we print the error message to stderr and return an empty list. | |
return ( | |
f"Error reading records from source '{source_connector_name}': {ex!r}, {ex!s}\n{tb_str}" | |
) | |
return records |
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 199-222: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
🤖 Prompt for AI Agents
In airbyte/mcp/_local_ops.py around lines 197 to 223, the function returns
either a list of records or a string error message, which complicates handling
for callers. Refactor the function to raise exceptions instead of returning
error strings to maintain consistent return types. Also, remove the unnecessary
else block after the try-except since the return in except exits the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
airbyte/mcp/_local_code_templates.py (3)
23-23
: Line length exceeds 100 characters - could we break this up?The logging warning message is quite long. What do you think about breaking it into multiple lines for better readability, wdyt?
- logging.warning("'.env' file not found. Please ensure it exists and contains the necessary credentials.") + logging.warning( + "'.env' file not found. Please ensure it exists and contains the necessary credentials." + )
32-32
: Another line length issue - could we format this more readably?Similar to the previous comment, this error message is quite long. Would you consider breaking it up for better readability, wdyt?
- sys.exit(f"Error: Environment variable '{var_name}' not set. Please add it to your .env file.") + sys.exit( + f"Error: Environment variable '{var_name}' not set. " + f"Please add it to your .env file." + )
31-31
: Consider using f-strings consistently throughout the templateI noticed mixed usage of f-strings and
.format()
placeholders. For consistency within the generated script, would you consider standardizing on f-strings throughout, wdyt? This would make the generated code more modern and consistent.For example:
- logging.error(f"Missing required environment variable: {var_name}") + logging.error(f"Missing required environment variable: {var_name}") - logging.info(f"Using default value for optional environment variable: {var_name}") + logging.info(f"Using default value for optional environment variable: {var_name}") - logging.info(f"Optional environment variable not set: {var_name}") + logging.info(f"Optional environment variable not set: {var_name}")Also applies to: 39-39, 42-42
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/mcp/_local_code_templates.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
airbyte/mcp/_local_code_templates.py
23-23: Line too long (109 > 100)
(E501)
32-32: Line too long (105 > 100)
(E501)
🪛 GitHub Actions: Run Linters
airbyte/mcp/_local_code_templates.py
[warning] 23-23: Ruff: Line too long (109 > 100) (E501)
[warning] 32-32: Ruff: Line too long (105 > 100) (E501)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (8)
airbyte/mcp/_local_code_templates.py (8)
1-3
: Nice file structure and clear purpose!The copyright header and docstring clearly establish the file's purpose. Good foundation for the template system.
4-26
: Clean template setup with good error handlingThe script template has solid foundations with proper imports, logging configuration, and environment file handling. The optional exit approach for missing .env files is thoughtful - gives users flexibility while providing clear warnings.
28-44
: Excellent environment variable helper functionsThese utility functions provide robust error handling and clear logging. The distinction between required and optional variables is well-implemented. The default value handling in
get_optional_env
is particularly thoughtful.
46-80
: Comprehensive source configuration with excellent error handlingThe source configuration section is well-structured with proper exception handling, connection verification, and stream selection. The comments providing alternative stream selection options are helpful for users. The logging throughout provides good visibility into the pipeline execution.
82-119
: Well-designed data processing section with good user guidanceThe data reading and DataFrame conversion logic is solid. I particularly appreciate:
- Clear error handling for each stream conversion
- Informative logging with row counts
- The prominent comment directing users to add their processing logic
- Defensive programming with the
if results.streams:
check
121-151
: Thoughtful destination handling with practical considerationsThe destination configuration mirrors the source setup well. The commented-out alternative approach for destination check failures (lines 147-149) shows good awareness of real-world connector behavior. The conservative approach of exiting by default is appropriate for safety.
153-174
: Clean pipeline execution with appropriate error handlingThe final pipeline execution section is well-structured. The separation of read and write operations with individual logging provides good visibility. The error handling ensures failures are properly logged and the script exits gracefully.
178-210
: Comprehensive documentation templateThe documentation template covers all essential aspects:
- Installation instructions
- Configuration guidance
- Running instructions
- Best practices
- Helpful links to official documentation
This will provide users with a complete getting-started experience. The use of placeholders for dynamic content is well-designed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (1)
airbyte/mcp/_coding.py (1)
39-44
: Consider edge cases in pipeline name generation?The pipeline name generation logic looks good, but I'm wondering about potential edge cases. What happens if the connector names contain special characters or if the generated
pipeline_id
becomes too long for filesystem constraints? Should we add some validation or sanitization here? wdyt?pipeline_id = pipeline_name.lower().replace(" ", "_").replace("'", "") +# Sanitize pipeline_id to ensure it's filesystem-safe +pipeline_id = "".join(c for c in pipeline_id if c.isalnum() or c in "_-") +pipeline_id = pipeline_id[:50] # Limit length for filesystem compatibility
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte/mcp/_coding.py
(1 hunks)airbyte/mcp/_coding_templates.py
(1 hunks)airbyte/mcp/_connector_config.py
(1 hunks)airbyte/mcp/server.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte/mcp/server.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte/mcp/_coding.py (1)
airbyte/sources/registry.py (1)
get_available_connectors
(236-293)
🪛 Ruff (0.11.9)
airbyte/mcp/_coding_templates.py
23-23: Line too long (109 > 100)
(E501)
32-32: Line too long (105 > 100)
(E501)
airbyte/mcp/_connector_config.py
4-4: typing.Annotated
imported but unused
Remove unused import: typing.Annotated
(F401)
7-7: pydantic.Field
imported but unused
Remove unused import: pydantic.Field
(F401)
🪛 GitHub Actions: Run Linters
airbyte/mcp/_coding_templates.py
[error] 23-23: Ruff E501: Line too long (109 > 100).
[error] 32-32: Ruff E501: Line too long (105 > 100).
airbyte/mcp/_connector_config.py
[error] 4-4: Ruff F401: typing.Annotated
imported but unused. Remove unused import.
[error] 7-7: Ruff F401: pydantic.Field
imported but unused. Remove unused import.
🪛 Flake8 (7.2.0)
airbyte/mcp/_connector_config.py
[error] 4-4: 'typing.Annotated' imported but unused
(F401)
[error] 7-7: 'pydantic.Field' imported but unused
(F401)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (6)
airbyte/mcp/_connector_config.py (1)
10-12
: Placeholder function looks good for now.The placeholder structure is clean and ready for future implementation. The docstring accurately describes the intended purpose.
airbyte/mcp/_coding.py (3)
13-28
: Great function signature with comprehensive annotations.The use of
Annotated
withField
descriptions provides excellent type hints and documentation for the MCP tool interface. This will help users understand what each parameter expects.
77-81
: Excellent return structure for MCP tool.The return dictionary structure is well-designed, providing the generated code, setup instructions, and a suggested filename. This gives users everything they need to get started.
84-86
: Clean tool registration implementation.The registration function is concise and follows the FastMCP pattern correctly.
airbyte/mcp/_coding_templates.py (2)
4-175
: Comprehensive and well-structured script template.The generated script template is excellent! It includes:
- Proper error handling and logging
- Environment variable management with helpful utilities
- Connection verification for both source and destination
- Flexible stream selection
- Data processing with DataFrame conversion
- Clear comments and structure
This will generate very usable pipeline scripts for users.
178-210
: Helpful documentation template with clear instructions.The documentation template provides excellent step-by-step instructions covering installation, configuration, and usage. The links to connector-specific documentation are particularly valuable.
pipeline_code: str = SCRIPT_TEMPLATE.format( | ||
source_connector_name=source_connector_name, | ||
source_config_dict={}, # Placeholder for source config | ||
destination_connector_name=destination_connector_name, | ||
destination_config_dict={}, # Placeholder for destination config | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Check template variable consistency?
I noticed the template is being formatted with source_connector_name
and destination_connector_name
, but looking at the template in _coding_templates.py
, it uses {source_name}
and {destination_name}
. Should we verify these variable names match between the template and the formatting call? wdyt?
🏁 Script executed:
#!/bin/bash
# Check template variable consistency between files
echo "Variables used in _coding.py formatting:"
grep -n "\.format(" airbyte/mcp/_coding.py -A 5
echo -e "\nVariables expected in template:"
grep -n "{.*}" airbyte/mcp/_coding_templates.py | head -20
Length of output: 2307
Align pipeline template variables with SCRIPT_TEMPLATE
placeholders
It looks like SCRIPT_TEMPLATE
in _coding_templates.py
uses {source_name}
and {destination_name}
(along with {source_config_dict}
and {destination_config_dict}
), but in airbyte/mcp/_coding.py
(lines 62–67) we’re calling:
SCRIPT_TEMPLATE.format(
source_connector_name=source_connector_name,
source_config_dict={},
destination_connector_name=destination_connector_name,
destination_config_dict={},
)
This mismatch will raise a KeyError
at runtime. Should we update it to:
- pipeline_code: str = SCRIPT_TEMPLATE.format(
- source_connector_name=source_connector_name,
- source_config_dict={},
- destination_connector_name=destination_connector_name,
- destination_config_dict={},
- )
+ pipeline_code: str = SCRIPT_TEMPLATE.format(
+ source_name=source_connector_name,
+ source_config_dict={},
+ destination_name=destination_connector_name,
+ destination_config_dict={},
+ )
wdyt?
🤖 Prompt for AI Agents
In airbyte/mcp/_coding.py around lines 62 to 67, the keys used in
SCRIPT_TEMPLATE.format do not match the placeholders defined in SCRIPT_TEMPLATE,
causing a KeyError. Update the keys in the format call to match the placeholders
exactly: replace source_connector_name with source_name and
destination_connector_name with destination_name, while keeping
source_config_dict and destination_config_dict as is.
if not load_dotenv(): | ||
logging.warning("'.env' file not found. Please ensure it exists and contains the necessary credentials.") | ||
# Optionally exit if .env is strictly required | ||
# sys.exit("'.env' file is required. Please create it with the necessary credentials.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix line length to satisfy linting rules?
The linter is flagging line 23 as too long (109 > 100 characters). Should we break this into multiple lines? wdyt?
- logging.warning("'.env' file not found. Please ensure it exists and contains the necessary credentials.")
+ logging.warning(
+ "'.env' file not found. Please ensure it exists and contains the necessary credentials."
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if not load_dotenv(): | |
logging.warning("'.env' file not found. Please ensure it exists and contains the necessary credentials.") | |
# Optionally exit if .env is strictly required | |
# sys.exit("'.env' file is required. Please create it with the necessary credentials.") | |
if not load_dotenv(): | |
logging.warning( | |
"'.env' file not found. Please ensure it exists and contains the necessary credentials." | |
) | |
# Optionally exit if .env is strictly required | |
# sys.exit("'.env' file is required. Please create it with the necessary credentials.") |
🧰 Tools
🪛 Ruff (0.11.9)
23-23: Line too long (109 > 100)
(E501)
🪛 GitHub Actions: Run Linters
[error] 23-23: Ruff E501: Line too long (109 > 100).
🤖 Prompt for AI Agents
In airbyte/mcp/_coding_templates.py around lines 22 to 25, the logging.warning
message on line 23 exceeds the 100 character limit set by the linter. To fix
this, break the long string into multiple concatenated strings or use implicit
string concatenation with parentheses to split the message across multiple
lines, ensuring each line stays within the character limit.
destination_name = "{destination_name}" | ||
logging.info(f"Configuring destination: {{destination_name}}") | ||
dest_config = {{ | ||
{destination_config_dict} | ||
}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same template variable issue for destination.
Similar to the source configuration, the destination template uses {destination_name}
but the formatting call uses destination_connector_name
. Should we fix this consistency issue? wdyt?
-destination_name = "{destination_name}"
+destination_name = "{destination_connector_name}"
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In airbyte/mcp/_coding_templates.py around lines 121 to 125, the template
variable for the destination name is inconsistent: the string uses
{destination_name} but the formatting call uses destination_connector_name. To
fix this, ensure both the template string and the formatting call use the same
variable name, either rename the template variable to
{destination_connector_name} or change the formatting call to use
destination_name for consistency.
logging.error(f"Missing required environment variable: {{var_name}}") | ||
sys.exit(f"Error: Environment variable '{{var_name}}' not set. Please add it to your .env file.") | ||
return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix line length for consistency with linting rules?
Similar to the previous issue, line 32 is flagged as too long (105 > 100 characters). Should we break this line as well? wdyt?
- sys.exit(f"Error: Environment variable '{var_name}' not set. Please add it to your .env file.")
+ sys.exit(
+ f"Error: Environment variable '{var_name}' not set. Please add it to your .env file."
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
logging.error(f"Missing required environment variable: {{var_name}}") | |
sys.exit(f"Error: Environment variable '{{var_name}}' not set. Please add it to your .env file.") | |
return value | |
logging.error(f"Missing required environment variable: {{var_name}}") | |
sys.exit( | |
f"Error: Environment variable '{var_name}' not set. Please add it to your .env file." | |
) | |
return value |
🧰 Tools
🪛 Ruff (0.11.9)
32-32: Line too long (105 > 100)
(E501)
🪛 GitHub Actions: Run Linters
[error] 32-32: Ruff E501: Line too long (105 > 100).
🤖 Prompt for AI Agents
In airbyte/mcp/_coding_templates.py around lines 31 to 33, the sys.exit call on
line 32 exceeds the 100 character limit. To fix this, break the long string into
multiple concatenated shorter strings or use implicit string concatenation to
keep each line within the limit, ensuring the message remains clear and
readable.
source_name = "{source_name}" | ||
logging.info(f"Configuring source: {{source_name}}") | ||
source_config = {{ | ||
{source_config_dict} | ||
}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Template variable mismatch detected.
I notice the template uses {source_name}
here, but in _coding.py
the formatting call uses source_connector_name
. This mismatch will cause the template formatting to fail. Should we align these variable names? wdyt?
-source_name = "{source_name}"
+source_name = "{source_connector_name}"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
source_name = "{source_name}" | |
logging.info(f"Configuring source: {{source_name}}") | |
source_config = {{ | |
{source_config_dict} | |
}} | |
source_name = "{source_connector_name}" | |
logging.info(f"Configuring source: {{source_name}}") | |
source_config = { | |
{source_config_dict} | |
} |
🤖 Prompt for AI Agents
In airbyte/mcp/_coding_templates.py around lines 47 to 51, the template variable
`source_name` is used but the formatting call in _coding.py uses
`source_connector_name`, causing a mismatch. To fix this, rename the template
variable from `source_name` to `source_connector_name` to match the formatting
call, ensuring consistent variable names across the template and code.
Summary by CodeRabbit