New Dataset & Benchmark: SPAR-7M and SPAR-Bench for Spatial Perception and Reasoning with VLMs

Hi,

Appreciate your outstanding effort in maintaining this great resource for the MLLM community.

We would like to recommend our recent work **“From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D”**, which focuses on enabling VLMs to perform **3D spatial reasoning** using only 2D observations. We introduce:

- **SPAR-7M**: a large-scale dataset with 7M QA pairs across 33 spatial tasks.
- **SPAR-Bench**: a comprehensive benchmark covering both low-level perception and high-level 3D reasoning in single-view and multi-view settings.

We hope our work can be considered for inclusion in the *Datasets & Benchmarks* or *Multimodal Spatial Understanding* section.

- 📄 Paper: https://arxiv.org/abs/2503.22976
- 🔗 GitHub: https://github.com/fudan-zvg/SPAR  
- 🌐 Project Page: https://fudan-zvg.github.io/spar

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New Dataset & Benchmark: SPAR-7M and SPAR-Bench for Spatial Perception and Reasoning with VLMs #231

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

New Dataset & Benchmark: SPAR-7M and SPAR-Bench for Spatial Perception and Reasoning with VLMs #231

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions