-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Labels
P3Low priority, leave it in the backlogLow priority, leave it in the backlog
Description
We would like to use some models via Transformers that support multimodal user messages.
- We want to support (some) Image-Text-to-Text models
- The current component (
HuggingFaceLocalChatGenerator
) might not easy/practical to extend, and it might make sense to develop a dedicated component - I would not give this investigation high-priority: for multimodal open models, it's better to first focus on Ollama that provides more standardization and does not require GPU; I would also expect users who have GPU to run vLLM (currently in Haystack it can be done via OpenAI with some limitations)
Metadata
Metadata
Assignees
Labels
P3Low priority, leave it in the backlogLow priority, leave it in the backlog