-
Notifications
You must be signed in to change notification settings - Fork 2.1k
feat: add support for complex types in component processing #9305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add support for complex types in component processing #9305
Conversation
haystack/tools/component_tool.py
Outdated
@@ -412,6 +412,23 @@ def _create_property_schema(self, python_type: Any, description: str, default: A | |||
f"Pydantic models (e.g. {python_type.__name__}) are not supported as input types for " | |||
f"component's run method." | |||
) | |||
elif python_type is dict or (origin is dict): | |||
schema = {"type": "object", "description": description, "additionalProperties": True} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anakin87 do you happen to know if by adding additionalProperties
here we run the risk of breaking ChatGenerators that don't support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure - it might be better to avoid it if not strictly necessary
I found this: https://platform.openai.com/docs/guides/structured-outputs/additionalproperties-false-must-always-be-set-in-objects
Anyway, it might be a good idea to involve @vblagoje who worked much on this, if we are not in a rush
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on the structured output or if you set the JSON response to strict. But that might be a separate issue we need to open where if structured output or strict is set in OpenAIChatGenerator then it should probably go in and update all additionalProperties
to false. I think this would need to be in the ChatGenerator since the ComponenTool (or any Tool) can't know yet if additionalProperties must be set to False.
haystack/tools/component_tool.py
Outdated
@@ -412,6 +412,23 @@ def _create_property_schema(self, python_type: Any, description: str, default: A | |||
f"Pydantic models (e.g. {python_type.__name__}) are not supported as input types for " | |||
f"component's run method." | |||
) | |||
elif python_type is dict or (origin is dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are handling dict
here now could we also remove dict
from the _create_basic_type_schema
function?
# Check the parameter schema | ||
assert "meta" in tool.parameters["properties"] | ||
assert "extraction_kwargs" in tool.parameters["properties"] | ||
|
||
# Meta should be oneOf with both object and array options | ||
meta_schema = tool.parameters["properties"]["meta"] | ||
assert meta_schema["description"].startswith("Optional metadata") | ||
assert "oneOf" in meta_schema | ||
|
||
# extraction_kwargs should be an object | ||
kwargs_schema = tool.parameters["properties"]["extraction_kwargs"] | ||
assert kwargs_schema["type"] == "object" | ||
assert "additionalProperties" in kwargs_schema | ||
assert kwargs_schema["additionalProperties"] is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the additional comments and individual asserts, but could we move update this to a full dict comparison. So
assert tool.parameters["properties"] == {...}
I think that would make it easier to judge at a glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better for readability. But no problem, I will do it
Co-authored-by: Sebastian Husch Lee <[email protected]>
if origin is dict and args and args[0] is str and args[1] is Any: | ||
if description and "meta" in description.lower(): | ||
schema = {"type": "string", "description": description} | ||
else: | ||
schema = {"type": "object", "description": description, "additionalProperties": True} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity why do we need this special case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! The reason we single out pure Dict[str, Any] parameters whose description mentions “meta” is that metadata blobs by definition have an arbitrary, unknown shape, and we don’t want to bake an open-ended object schema (with unpredictable keys) into our function spec.
- If I treated metadata as a normal object with
additionalProperties: true
, undertools_strict
I would have to enumerate every possible metadata key (which we can’t know ahead of time), and LLMs often struggle to fill such free-form nested schemas. - All other dicts (that aren’t metadata) still fall back to an object schema with
additionalProperties: true
.
@sjrl — Could you please take a look and let me know if this now aligns with your requirements? I’m happy to tweak it further if needed. |
Related Issues
parameters
inComponentTool
does not work for more complex types #9292Proposed Changes:
The
ComponentTool
class was incorrectly handling complex types like dictionaries and unions when generating parameter schemas. Specifically:Fixed the schema generation for dictionary types (
Dict[str, Any]
) by:"type": "object"
schema"additionalProperties": true
to allow arbitrary key-value pairsAdded proper handling for Union types by:
oneOf
schema generation for Union typesUnion[Dict[str, Any], List[Dict[str, Any]]]
type hintsThese changes ensure that tools like HTMLToDocument's parameters are correctly represented in the schema, allowing LLMs to properly understand and use the parameters.
How did you test it?
Added a new comprehensive unit test
test_from_component_with_complex_types
that verifies:Ran all existing tests to ensure no regressions
Verified pre-commit hooks pass including:
Notes for the reviewer
Key areas to review:
_create_property_schema
method incomponent_tool.py
where the main changes were madetest_component_tool.py
that verifies the fixoneOf
schema generationChecklist
fix: improve ComponentTool parameter schema for complex types