Skip to content

Incorrect Output in Visual Grounding Task #33

Open
@sleepyshep

Description

@sleepyshep

Hello! I've encountered an issue with the visual grounding task where the model frequently outputs meaningless bounding boxes. I'm testing the deepseek-vl2-small model. Below, I'll provide the test code and a wrong case.

Test Code

expression = img_info['caption']
prompt = f"<image>\n<|ref|>{expression}<|/ref|>."
conversation = [
    {
        "role": "<|User|>",
        "content": prompt,
        "images": [image_path],
    },
    {"role": "<|Assistant|>", "content": ""},
]

pil_images = load_pil_images(conversation)
prepare_inputs = vl_chat_processor(
    conversations=conversation,
    images=pil_images,
    force_batchify=True,
    system_prompt=""
).to(vl_gpt.device)

inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)

outputs = vl_gpt.language.generate(
    inputs_embeds=inputs_embeds,
    attention_mask=prepare_inputs.attention_mask,
    pad_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=512,
    do_sample=False,
    use_cache=True
)
answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=False)
print(f"{prepare_inputs['sft_format'][0]}", answer)

Wrong Case

Image:
2350216

Prompt:
<image>\n<|ref|>The hat which is white.<|/ref|>.

Response:
<|ref|>The hat which is white. .<|/ref|><|det|>[[2, 159, 4, 43958, 970]]<|/det|>

Sometimes, the response is even more nonsensical, such as:
<|ref|>Small and silver, this mirror gle.<|/ref|><|det|>[[0, 60, 30, 999999, 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999

There are many similar errors where the model outputs nonsensical bounding boxes. I would appreciate any guidance on how to resolve this issue.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions