Skip to content

Investigate image support with Transformers #9672

@anakin87

Description

@anakin87

We would like to use some models via Transformers that support multimodal user messages.

  • We want to support (some) Image-Text-to-Text models
  • The current component (HuggingFaceLocalChatGenerator) might not easy/practical to extend, and it might make sense to develop a dedicated component
  • I would not give this investigation high-priority: for multimodal open models, it's better to first focus on Ollama that provides more standardization and does not require GPU; I would also expect users who have GPU to run vLLM (currently in Haystack it can be done via OpenAI with some limitations)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low priority, leave it in the backlog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions