Can I input 2688×672 images? #14

mvsoom · 2024-04-16T15:03:06Z

First of all, this looks amazing, thanks for open sourcing this!

I have a general question about LLaVA-UHD. In the paper conclusion, it says

Conclusion

In this work, we present LLaVA-UHD, a large multimodal model that efficiently perceives any aspect ratio and high-resolution images. [...] In this work, we limit the resolution of LLaVA-UHD to maximum of 672×1008. In future, considering the promising efficiency and scalability, we will explore higher-resolution images and more challenging tasks such as small object detection and segmentation. [...]

Does this mean that the maximum resolution of the implementation in this repo is 672×1008, or can I effectively input images with arbitrary ratio? I am specifically looking for 2688×672 (2 rows and 8 columns of 336×336 patches).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I input 2688×672 images? #14

Can I input 2688×672 images? #14

mvsoom commented Apr 16, 2024 •

edited

Loading

Can I input 2688×672 images? #14

Can I input 2688×672 images? #14

Comments

mvsoom commented Apr 16, 2024 • edited Loading

mvsoom commented Apr 16, 2024 •

edited

Loading