Investigate image support with Transformers

We would like to use some models via Transformers that support multimodal user messages.

- We want to support (some) [Image-Text-to-Text](https://huggingface.co/models?pipeline_tag=image-text-to-text) models
- The current component (`HuggingFaceLocalChatGenerator`) might not easy/practical to extend, and it might make sense to develop a dedicated component
- I would not give this investigation high-priority: for multimodal open models, it's better to first focus on Ollama that provides more standardization and does not require GPU; I would also expect users who have GPU to run vLLM (currently in Haystack it can be done via OpenAI with some limitations)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate image support with Transformers #9672

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate image support with Transformers #9672

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions