Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DINOv as a new tool #44

Merged
merged 32 commits into from
Apr 19, 2024
Merged

Add DINOv as a new tool #44

merged 32 commits into from
Apr 19, 2024

Conversation

humpydonkey
Copy link
Collaborator

Example usage

from vision_agent.tools.tools import DINOv


img_path = "/Users/asia/Downloads/data/bags.jpg"
request = {
    "prompt": [
        {
            "mask": "/Users/asia/Downloads/data/mask_prompt_bags0.jpg",
            "image": img_path,
        },
        {
            "mask": "/Users/asia/Downloads/data/mask_prompt_bags1.jpg",
            "image": img_path,
        },
    ],
    "image": img_path,
}
res = DINOv()(**request)
res

AsiaCao and others added 30 commits April 9, 2024 09:42
* Update prompts.py

* Update vision_agent_prompts.py

* Update reflexion_prompts.py

* Update vision_agent_prompts.py

* Update easytool_prompts.py

* Update prompts.py

* Update vision_agent_prompts.py
* get endpoint ready for demo

fixed tools.json

Update vision_agent/tools/tools.py

Bug fixes

* Fix linter errors

* Fix a bug in result parsing

* Include scores in the G-SAM model response

* Removed tools.json , need to find better format

* Fixing the endpoint for CLIP and adding thresholds for grounding tools

* fix mypy errors

* fixed example notebook

---------

Co-authored-by: Yazhou Cao <[email protected]>
Co-authored-by: shankar_ws3 <[email protected]>
Add a callback for reporting the chat progress of an agent
Co-authored-by: Yazhou Cao <[email protected]>
Fix another typo

Co-authored-by: Yazhou Cao <[email protected]>
* fix visualization error

* added font and score to viz

* changed to smaller font file

* Support streaming chat logs of an agent (#47)

Add a callback for reporting the chat progress of an agent

* fix visualize score issue

* updated descriptions, fixed counter bug

* added visualize_output

* make feedback more concrete

* made naming more consistent

* replaced individual calc ops with calculator tool

* fix random colors

* fix prompts for tools

* update reflection prompt

* update readme

* formatting fix

* fixed mypy errors

* fix merge issue

---------

Co-authored-by: Asia <[email protected]>
added image caption tool
* Switch the host of model endpoint to api.dev.landing.ai
* DRY/Abstract out the inference code in tools
* Introduce LandingaiAPIKey and support loading from .env file
* Add integration tests for four model tools
* Minor tweaks/fixes
* Remove dead code
* Bump the minor version to 0.1.0
* visualized output/reflection to handle extract_frames_

* remove ipdb

* added json mode for lmm, upgraded gpt-4-turbo

* updated reflection prompt

* refactor to make function simpler

* updated reflection prompt, add tool usage doc

* fixed format issue

* fixed type issue

* fixed test case
* Tweak frame extraction function

* remove default motion detection, extract at 0.5 fps

* lmm now take multiple images

* removed counter

* tweaked prompt

* updated vision agent to reflect on multiple images

* fix test case

* added box distance

* adjusted prompts

---------

Co-authored-by: Yazhou Cao <[email protected]>
Co-authored-by: Dillon Laird <[email protected]>
@dillonalaird dillonalaird merged commit 7d72439 into main Apr 19, 2024
7 checks passed
@dillonalaird dillonalaird deleted the add-dinov branch April 22, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants