如何进行图像-文本通过CLIP嵌入在同一Embedding空间进行测试？ #270

Nomothings · 2025-01-02T12:18:29Z

功能描述 / Feature Description

您好，现在我在评测多模态RAG的检索和生成时遇见了问题：
我们的数据集模态多样，既有文本(text)，也有图像(image)，现在想构建image_queries.jsonl的时候发现，该queires构建只能为：
纯文本：queries.jsonl
纯图像：image_queries.jsonl
但是对于图像、文本混合的并没有涉及到检索方式。（事实上这很容易，因为CLIP能够实现图像、文本嵌入到同一空间）

需求背景 / Background

我们新的工作引用到了您的评测工具和方法，但是无法在关键实现：混合数据下的RAG。因此希望您能更新方法，或者提供混合检索评测的queries.jsonl构建

预期行为 / Expected Behavior

这个功能可以混合图像、文本，比如：
{"_id": "doc4", "text": "随着技术的进步，风能和太阳能等可再生能源变得越来越普及。"}
{"image_path": "custom_eval/multimodal/images/AMNH.jpg", "query": ["building"]}
然后可以自动进行评测

其他信息 / Additional Information

还有其他相关信息吗？ / Any other relevant information?

Yunnglin · 2025-01-02T12:37:04Z

你是希望可以评测CLIP模型的文本、图像混合检索能力吗

Nomothings · 2025-01-02T13:00:19Z

是的是的没错，就是图像和文本通过CLIP嵌入到一个库中评测

Yunnglin · 2025-01-03T02:03:48Z

好的，我们会尽快补充这部分功能

Nomothings · 2025-01-03T06:49:33Z

好的感谢，期待您和团队的更新

Yunnglin added the enhancement New feature or request label Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

如何进行图像-文本通过CLIP嵌入在同一Embedding空间进行测试？ #270

如何进行图像-文本通过CLIP嵌入在同一Embedding空间进行测试？ #270

Nomothings commented Jan 2, 2025

Yunnglin commented Jan 2, 2025

Nomothings commented Jan 2, 2025

Yunnglin commented Jan 3, 2025

Nomothings commented Jan 3, 2025

如何进行图像-文本通过CLIP嵌入在同一Embedding空间进行测试？ #270

如何进行图像-文本通过CLIP嵌入在同一Embedding空间进行测试？ #270

Comments

Nomothings commented Jan 2, 2025

功能描述 / Feature Description

需求背景 / Background

预期行为 / Expected Behavior

其他信息 / Additional Information

Yunnglin commented Jan 2, 2025

Nomothings commented Jan 2, 2025

Yunnglin commented Jan 3, 2025

Nomothings commented Jan 3, 2025