Visual Evaluation with Foundation Models

👁️‍🗨️ Low-level Visual Perception in the Foundation Model Era

🔖Aiming at next-era cornerstone research

⭐ Low-level Visual Perception | Multi-Modality Large Language Models | Visual Quality Assessment

④Co-Instruct: Homepage, Repo, Demo. Open-ended visual quality comparer (up to 4 images), low-level visual assistant, an improved version of ②Q-Instruct [CVPR 2024].
③Q-Align [ICML 2024]: Homepage, Repo, Demo. A unified visual scorer for images and videos, via text-instructed alignment on multi-modality foundation models; can efficiently fine-tune to more datasets with stable good performance. State-of-the-art on IQA, VQA, and IAA.
②Q-Instruct [CVPR 2024]: Homepage, Repo, 200K Dataset, Technical Report A large-scale instruction tuning dataset to improve low-level perceptual abilities of foundation models.
①Q-Bench+ [ICLR2024, Spotlight]: Homepage, Repo, Data-Single, Data-Pair, Preprint The first low-level benchmark for foundation models on low-level vision.

Q-Boost: Homepage A discussion on boosting the IQA performance for non-specially-IQA-aligned MLLMs.
[Pending]Chinese-Q-Bench/质衡: Homepage, Repo The first attempt to test multi-lingual abilities on low-level vision.

Maintained by Teo Wu@Singapore and Zicheng Zhang@Shanghai.