Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question in Visual Grounding, what format of the region should i give? #164

Open
LokiXun opened this issue Jul 23, 2024 · 0 comments
Open

Comments

@LokiXun
Copy link

LokiXun commented Jul 23, 2024

Hi, I have a question for visual grounding.
I have a 720x1280 image and i want to describe the region in [0,0, 512,512] (x1,y1, x2,y2)so I follow the CogVLM1's suggestion to change the coordinate this way ( https://github.com/THUDM/CogVLM?tab=readme-ov-file#cookbook )

Format of coordination: The bounding box coordinates in the model's input and output use the format [[x1, y1, x2, y2]], with the origin at the top left corner, the x-axis to the right, and the y-axis downward. (x1, y1) and (x2, y2) are the top-left and bottom-right corners, respectively, with values as relative coordinates multiplied by 1000 (prefixed with zeros to three digits).

So my prompt is the following but the model tend to provide me a description of the whole image, Is my prompt right?

Tell me what you see within the designated area [[000,000,400,712]] in the picture

# this is how I get the region value
origin region [0,0,512,512]
target format: [[000,000,512/1280*1000, 512/720*1000]] >> [[000,000,400,712]]

example:

Tell me what you see within the designated area [[000,000,400,712]] in the picture. Describe each object in a simple sentence is enough.

image
image

CogVLM2's result

CogVLM2: Within the designated area, the foreground displays a green bus, parked cars, and a pedestrian crossing sign, while the background includes a blue bus stop sign, trees, and a building, all under a clear sky.<|end_of_text|>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant