diff --git a/README.md b/README.md index 3eeb19c8..c4c16e1b 100644 --- a/README.md +++ b/README.md @@ -153,7 +153,6 @@ find an example that creates a custom tool for template matching [here](examples | GroundingDINO | GroundingDINO is a tool that can detect arbitrary objects with inputs such as category names or referring expressions. | | GroundingSAM | GroundingSAM is a tool that can detect and segment arbitrary objects with inputs such as category names or referring expressions. | | DINOv | DINOv is a tool that can detect arbitrary objects with using a referring mask. | -| ExtractFrames | ExtractFrames extracts frames with motion from a video. | | Crop | Crop crops an image given a bounding box and returns a file name of the cropped image. | | BboxArea | BboxArea returns the area of the bounding box in pixels normalized to 2 decimal places. | | SegArea | SegArea returns the area of the segmentation mask in pixels normalized to 2 decimal places. | diff --git a/docs/index.md b/docs/index.md index e0033dfa..8f7d4cbf 100644 --- a/docs/index.md +++ b/docs/index.md @@ -84,13 +84,23 @@ you. For example: | Tool | Description | | --- | --- | | CLIP | CLIP is a tool that can classify or tag any image given a set of input classes or tags. | +| ImageCaption| ImageCaption is a tool that can generate a caption for an image. | | GroundingDINO | GroundingDINO is a tool that can detect arbitrary objects with inputs such as category names or referring expressions. | | GroundingSAM | GroundingSAM is a tool that can detect and segment arbitrary objects with inputs such as category names or referring expressions. | -| Counter | Counter detects and counts the number of objects in an image given an input such as a category name or referring expression. | +| DINOv | DINOv is a tool that can detect arbitrary objects with using a referring mask. | | Crop | Crop crops an image given a bounding box and returns a file name of the cropped image. | | BboxArea | BboxArea returns the area of the bounding box in pixels normalized to 2 decimal places. | | SegArea | SegArea returns the area of the segmentation mask in pixels normalized to 2 decimal places. | -| ExtractFrames | ExtractFrames extracts image frames from the input video. | - +| BboxIoU | BboxIoU returns the intersection over union of two bounding boxes normalized to 2 decimal places. | +| SegIoU | SegIoU returns the intersection over union of two segmentation masks normalized to 2 decimal places. | +| BoxDistance | BoxDistance returns the minimum distance between two bounding boxes normalized to 2 decimal places. | +| MaskDistance | MaskDistance returns the minimum distance between two segmentation masks in pixel units | +| BboxContains | BboxContains returns the intersection of two boxes over the target box area. It is good for check if one box is contained within another box. | +| ExtractFrames | ExtractFrames extracts frames with motion from a video. | +| ZeroShotCounting | ZeroShotCounting returns the total number of objects belonging to a single class in a given image. | +| VisualPromptCounting | VisualPromptCounting returns the total number of objects belonging to a single class given an image and visual prompt. | +| VisualQuestionAnswering | VisualQuestionAnswering is a tool that can explain the contents of an image and answer questions about the image. | +| ImageQuestionAnswering | ImageQuestionAnswering is similar to VisualQuestionAnswering but does not rely on OpenAI and instead uses a dedicated model for the task. | +| OCR | OCR returns the text detected in an image along with the location. | It also has a basic set of calculate tools such as add, subtract, multiply and divide. diff --git a/vision_agent/agent/vision_agent_prompts.py b/vision_agent/agent/vision_agent_prompts.py index 14a53972..8b3cbaa1 100644 --- a/vision_agent/agent/vision_agent_prompts.py +++ b/vision_agent/agent/vision_agent_prompts.py @@ -70,7 +70,7 @@ Please note that: 1. You should only choose one tool from the Tool List to solve this question and it should have maximum chance of solving the question. -2. You should only choose the tool whose parameters are most relevant to the user's question and are availale as part of the question. +2. You should only choose the tool whose parameters are most relevant to the user's question and are available as part of the question. 3. You should choose the tool whose return type is most relevant to the answer of the user's question. 4. You must ONLY output the ID of the tool you chose in a parsible JSON format. Two example outputs look like: @@ -88,7 +88,7 @@ Please note that: 1. You should only choose one tool from the Tool List to solve this question and it should have maximum chance of solving the question. -2. You should only choose the tool whose parameters are most relevant to the user's question and are availale as part of the question. +2. You should only choose the tool whose parameters are most relevant to the user's question and are available as part of the question. 3. You should choose the tool whose return type is most relevant to the answer of the user's question. 4. You must ONLY output the ID of the tool you chose in a parsible JSON format. Two example outputs look like: @@ -100,7 +100,7 @@ CHOOSE_PARAMETER_DEPENDS = """Given a user's question and an API tool documentation, you need to output parameters according to the API tool documentation to successfully call the API to solve the user's question. Please note that: 1. The Example in the API tool documentation can help you better understand the use of the API. Pay attention to the examples which show how to parse the question and extract tool parameters such as prompts and visual inputs. -2. Ensure the parameters you output are correct. The output must contain the required parameters, and can contain the optional parameters based on the question. If there are no paremters in the required parameters and optional parameters, just leave it as {{"Parameters":{{}}}} +2. Ensure the parameters you output are correct. The output must contain the required parameters, and can contain the optional parameters based on the question. If there are no parameters in the required parameters and optional parameters, just leave it as {{"Parameters":{{}}}} 3. If the user's question mentions other APIs, you should ONLY consider the API tool documentation I give and do not consider other APIs. 4. The question may have dependencies on answers of other questions, so we will provide logs of previous questions and answers for your reference. 5. If you need to use this API multiple times, please set "Parameters" to a list.