Skip to content

feat: add support for complex types in component processing #9305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

YassinNouh21
Copy link
Contributor

Related Issues

Proposed Changes:

The ComponentTool class was incorrectly handling complex types like dictionaries and unions when generating parameter schemas. Specifically:

  1. Fixed the schema generation for dictionary types (Dict[str, Any]) by:

    • Adding proper "type": "object" schema
    • Setting "additionalProperties": true to allow arbitrary key-value pairs
  2. Added proper handling for Union types by:

    • Implementing oneOf schema generation for Union types
    • Correctly handling Union[Dict[str, Any], List[Dict[str, Any]]] type hints
    • Removing redundant descriptions in nested schemas

These changes ensure that tools like HTMLToDocument's parameters are correctly represented in the schema, allowing LLMs to properly understand and use the parameters.

How did you test it?

  1. Added a new comprehensive unit test test_from_component_with_complex_types that verifies:

    • Correct schema generation for dictionary parameters
    • Proper handling of Union types with both dict and list[dict] variants
    • Parameter validation during tool invocation
    • Successful execution with both dictionary and list inputs
  2. Ran all existing tests to ensure no regressions

  3. Verified pre-commit hooks pass including:

    • Type checking
    • Code formatting
    • Linting

Notes for the reviewer

Key areas to review:

  • The _create_property_schema method in component_tool.py where the main changes were made
  • The new test case in test_component_tool.py that verifies the fix
  • The handling of Union types with the oneOf schema generation

Checklist

@YassinNouh21 YassinNouh21 requested a review from a team as a code owner April 24, 2025 12:24
@YassinNouh21 YassinNouh21 requested review from anakin87 and removed request for a team April 24, 2025 12:24
@anakin87 anakin87 requested a review from sjrl April 24, 2025 12:27
@@ -412,6 +412,23 @@ def _create_property_schema(self, python_type: Any, description: str, default: A
f"Pydantic models (e.g. {python_type.__name__}) are not supported as input types for "
f"component's run method."
)
elif python_type is dict or (origin is dict):
schema = {"type": "object", "description": description, "additionalProperties": True}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anakin87 do you happen to know if by adding additionalProperties here we run the risk of breaking ChatGenerators that don't support it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure - it might be better to avoid it if not strictly necessary

I found this: https://platform.openai.com/docs/guides/structured-outputs/additionalproperties-false-must-always-be-set-in-objects

Anyway, it might be a good idea to involve @vblagoje who worked much on this, if we are not in a rush

Copy link
Contributor

@sjrl sjrl Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the structured output or if you set the JSON response to strict. But that might be a separate issue we need to open where if structured output or strict is set in OpenAIChatGenerator then it should probably go in and update all additionalProperties to false. I think this would need to be in the ChatGenerator since the ComponenTool (or any Tool) can't know yet if additionalProperties must be set to False.

@@ -412,6 +412,23 @@ def _create_property_schema(self, python_type: Any, description: str, default: A
f"Pydantic models (e.g. {python_type.__name__}) are not supported as input types for "
f"component's run method."
)
elif python_type is dict or (origin is dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are handling dict here now could we also remove dict from the _create_basic_type_schema function?

Comment on lines +363 to +376
# Check the parameter schema
assert "meta" in tool.parameters["properties"]
assert "extraction_kwargs" in tool.parameters["properties"]

# Meta should be oneOf with both object and array options
meta_schema = tool.parameters["properties"]["meta"]
assert meta_schema["description"].startswith("Optional metadata")
assert "oneOf" in meta_schema

# extraction_kwargs should be an object
kwargs_schema = tool.parameters["properties"]["extraction_kwargs"]
assert kwargs_schema["type"] == "object"
assert "additionalProperties" in kwargs_schema
assert kwargs_schema["additionalProperties"] is True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the additional comments and individual asserts, but could we move update this to a full dict comparison. So

assert tool.parameters["properties"] == {...}

I think that would make it easier to judge at a glance.

Copy link
Contributor Author

@YassinNouh21 YassinNouh21 Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better for readability. But no problem, I will do it

@github-actions github-actions bot added the type:documentation Improvements on the docs label Apr 24, 2025
Comment on lines +444 to +448
if origin is dict and args and args[0] is str and args[1] is Any:
if description and "meta" in description.lower():
schema = {"type": "string", "description": description}
else:
schema = {"type": "object", "description": description, "additionalProperties": True}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity why do we need this special case?

Copy link
Contributor Author

@YassinNouh21 YassinNouh21 Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! The reason we single out pure Dict[str, Any] parameters whose description mentions “meta” is that metadata blobs by definition have an arbitrary, unknown shape, and we don’t want to bake an open-ended object schema (with unpredictable keys) into our function spec.

  • If I treated metadata as a normal object with additionalProperties: true, under tools_strict I would have to enumerate every possible metadata key (which we can’t know ahead of time), and LLMs often struggle to fill such free-form nested schemas.
  • All other dicts (that aren’t metadata) still fall back to an object schema with additionalProperties: true.

@YassinNouh21
Copy link
Contributor Author

@sjrl — Could you please take a look and let me know if this now aligns with your requirements? I’m happy to tweak it further if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The auto construction of Tool parameters in ComponentTool does not work for more complex types
3 participants