[Request] 好像不支持 Ollama 的 llama3.2-vision 图片 #4642

samurai00 · 2024-11-08T04:04:06Z

🥰 需求描述

Ollama 0.4.0 支持了 llama3.2-vision 模型，可以识别图片。https://ollama.com/blog/llama3.2-vision

目前尝试了在 LobeChat v1.28.4 中调用了 llama3.2-vision 模型，发现不能正确处理图片。

从日志可以看到相关请求体：

{
  "messages": [
    {
      "content": "图片上有什么文字\n\n<files_info>\n<images>\n<images_docstring>here are user upload images you can refer to</images_docstring>\n<image name=\"deval.png\" url=\"https://s3-lobechat.tabun.pro/files/480844/8b97987b-0e33-4a02-9fbc-f03dc34e0567.png\"></image>\n</images>\n\n</files_info>",
      "role": "user"
    }
  ],
  "model": "llama3.2-vision",
  "options": {
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "temperature": 0.35,
    "top_p": 1
  },
  "stream": true
}

看起来是把图片放在 content 中了，ollama 的 llama3.2-vision 模型的支持方式可能不同。

希望能够支持一下，谢谢🙏！

🧐 解决方案

从 ollama 的文档看，应该是类似以下的格式：

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2-vision",
  "messages": [
    {
      "role": "user",
      "content": "what is in this image?",
      "images": ["<base64-encoded image data>"]
    }
  ]
}'

既图片需要 base64 encode 之后放在 images 中，而且是一个数组。

📝 补充信息

No response

The text was updated successfully, but these errors were encountered:

SpeedupMaster · 2024-11-08T04:19:56Z

LLM_VISION_IMAGE_USE_BASE64=1这个有设置吗？

samurai00 · 2024-11-08T06:21:19Z

LLM_VISION_IMAGE_USE_BASE64=1这个有设置吗？

» docker exec -it lobe-chat-database sh
/ $ echo $LLM_VISION_IMAGE_USE_BASE64
1
/ $

{
  "messages": [
    {
      "content": "图片中有什么？\n\n\n<files_info>\n<images>\n<images_docstring>here are user upload images you can refer to</images_docstring>\n<image name=\"截屏2024-05-22 17.52.16.png\" url=\"https://s3-lobechat.tabun.pro/files/480846/875b55ec-923a-48ce-b1e3-0730c4a92794.png\"></image>\n</images>\n\n</files_info>",
      "role": "user"
    }
  ],
  "model": "llama3.2-vision",
  "options": {
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "temperature": 0.35,
    "top_p": 1
  },
  "stream": true
}

设置了环境变量 LLM_VISION_IMAGE_USE_BASE64=1 也还是一样

SpeedupMaster · 2024-11-08T12:52:17Z

看这个 #3888 好像还没实现Ollama url 转 base64，不过有的Ollama模型又可以识别图片

This appears to be an XML (Extensible Markup Language) file that contains information about a single image. Here's a breakdown of the contents:

<files_info>: The root element, which indicates that this is a container for file-related information.
<images>: A child element within <files_info>, suggesting that it holds information about images specifically.
<image>: A child element within <images>, representing an individual image.
- name: An attribute of the <image> element, specifying the filename of the image (648557.jpg).
- url: Another attribute of the <image> element, providing a URL where the image can be accessed (http://localhost:9000/lobe/files/480591/bb8bb4b9-f001-4ba5-8162-acd7c47b688b.jpg).

In summary, this XML snippet describes a single image file with its filename and URL. The context appears to be a web application or API that handles file uploads, as hinted by the localhost:9000 URL.

samurai00 · 2024-11-13T10:41:31Z

好像是 database 版本不行，先上传图片至 S3 之后，image_url 就变成真的 URL 了，而不是 data:[<media-type>][;base64],<data> 。非 database 版本，本身就是 Data URL 看起来好像没问题

lobe-chat/src/libs/agent-runtime/ollama/index.ts

Line 100 in f6e4d00

const { base64 } = parseDataUri(content.image_url.url);

lobe-chat/src/libs/agent-runtime/utils/uriParser.ts

Line 7 in f6e4d00

export const parseDataUri = (dataUri: string): UriParserResult => {

- Fix image data not being passed to correct field in API request - Enable proper image recognition functionality for llama3.2-vision model fix lobehub#4642 Related lobehub#3888

samurai00 added the 🌠 Feature Request New feature or request | 特性与建议 label Nov 8, 2024

dosubot bot added ollama Relative to Ollama Provider and ollama models vision labels Nov 8, 2024

samurai00 linked a pull request Nov 15, 2024 that will close this issue

fix: correct image upload field for ollama llama3.2-vision #4697

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Request] 好像不支持 Ollama 的 llama3.2-vision 图片 #4642

[Request] 好像不支持 Ollama 的 llama3.2-vision 图片 #4642

samurai00 commented Nov 8, 2024

SpeedupMaster commented Nov 8, 2024

samurai00 commented Nov 8, 2024

SpeedupMaster commented Nov 8, 2024

samurai00 commented Nov 13, 2024

[Request] 好像不支持 Ollama 的 llama3.2-vision 图片 #4642

[Request] 好像不支持 Ollama 的 llama3.2-vision 图片 #4642

Comments

samurai00 commented Nov 8, 2024

🥰 需求描述

🧐 解决方案

📝 补充信息

SpeedupMaster commented Nov 8, 2024

samurai00 commented Nov 8, 2024

SpeedupMaster commented Nov 8, 2024

samurai00 commented Nov 13, 2024