You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, this looks amazing, thanks for open sourcing this!
I have a general question about LLaVA-UHD. In the paper conclusion, it says
Conclusion
In this work, we present LLaVA-UHD, a large multimodal model that efficiently perceives any aspect ratio and high-resolution images. [...] In this work, we limit the resolution of LLaVA-UHD to maximum of 672×1008. In future, considering the promising efficiency and scalability, we will explore higher-resolution images and more challenging tasks such as small object detection and segmentation. [...]
Does this mean that the maximum resolution of the implementation in this repo is 672×1008, or can I effectively input images with arbitrary ratio? I am specifically looking for 2688×672 (2 rows and 8 columns of 336×336 patches).
The text was updated successfully, but these errors were encountered:
First of all, this looks amazing, thanks for open sourcing this!
I have a general question about LLaVA-UHD. In the paper conclusion, it says
Does this mean that the maximum resolution of the implementation in this repo is 672×1008, or can I effectively input images with arbitrary ratio? I am specifically looking for 2688×672 (2 rows and 8 columns of 336×336 patches).
The text was updated successfully, but these errors were encountered: