Adding OWLViT/OWLV2 as options for the visual grounding part #55

skulshreshtha · 2024-04-05T12:24:18Z

🚀 Feature

Currently, the project uses GroundingDINO as the visual grounding model which is the best performing model for some benchmark datasets

We can provide the user flexibility to choose between different visual grounding models like

Motivation & Examples

Tell us why the feature is useful.
Since this project is about text guided segmentation, adding the ability to choose the technique for visual grounding pipeline seems like a natural addition.

Describe what the feature would look like, if it is implemented.
Best demonstrated using code examples in addition to words.

from PIL import Image
from lang_sam import LangSAM

# Initialize and select visual grounding model if desired. Default will be 'groundingdino'. Other options are 'ofa', 'owlvit', and 'owlv2'
model = LangSAM(model = 'groundingdino') 
image_pil = Image.open("./assets/car.jpeg").convert("RGB")
text_prompt = "wheel"
masks, boxes, phrases, logits = model.predict(image_pil, text_prompt)

Note

We only consider adding new features if they are relevant to this library.
Consider if this new feature deserves to be here or should be a new library.

The text was updated successfully, but these errors were encountered:

luca-medeiros · 2024-04-06T11:31:21Z

@skulshreshtha Interesting!
Do you want to try an implementation for it?

skulshreshtha · 2024-04-06T12:17:00Z

@luca-medeiros Yes, sure. If you think this makes sense, I can try and raise a PR for this.

ogencoglu · 2024-10-25T16:51:43Z

+1 for this

skulshreshtha added the enhancement New feature or request label Apr 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding OWLViT/OWLV2 as options for the visual grounding part #55

Adding OWLViT/OWLV2 as options for the visual grounding part #55

skulshreshtha commented Apr 5, 2024

luca-medeiros commented Apr 6, 2024

skulshreshtha commented Apr 6, 2024

ogencoglu commented Oct 25, 2024

Adding OWLViT/OWLV2 as options for the visual grounding part #55

Adding OWLViT/OWLV2 as options for the visual grounding part #55

Comments

skulshreshtha commented Apr 5, 2024

🚀 Feature

Motivation & Examples

Note

luca-medeiros commented Apr 6, 2024

skulshreshtha commented Apr 6, 2024

ogencoglu commented Oct 25, 2024