diff --git a/README.md b/README.md index f41bef31..88c59973 100644 --- a/README.md +++ b/README.md @@ -41,15 +41,15 @@ export OPENAI_API_KEY="your-api-key" ``` ### Vision Agent -There are two agents that you can use. Vision Agent is a conversational agent that has +There are two agents that you can use. `VisionAgent` is a conversational agent that has access to tools that allow it to write an navigate python code and file systems. It can -converse with the user in natural language. VisionAgentCoder is an agent that can write -code for vision tasks, such as counting people in an image. However, it cannot converse -and can only respond with code. VisionAgent can call VisionAgentCoder to write vision -code. +converse with the user in natural language. `VisionAgentCoder` is an agent specifically +for writing code for vision tasks, such as counting people in an image. However, it +cannot chat with you and can only respond with code. `VisionAgent` can call +`VisionAgentCoder` to write vision code. #### Basic Usage -To run the streamlit app locally to chat with Vision Agent, you can run the following +To run the streamlit app locally to chat with `VisionAgent`, you can run the following command: ```bash @@ -146,7 +146,7 @@ the code and having it update. You just need to add the code as a response from assistant: ```python -agent = va.agent.VisionAgent(verbosity=2) +agent = va.agent.VisionAgentCoder(verbosity=2) conv = [ { "role": "user", @@ -212,6 +212,10 @@ function. Make sure the documentation is in the same format above with descripti `Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case [here](examples/custom_tools/) as this is what the agent uses to pick and use the tool. +Can't find the tool you need and want add it to `VisionAgent`? Check out our +[vision-agent-tools](https://github.com/landing-ai/vision-agent-tools) repository where +we add the source code for all the tools used in `VisionAgent`. + ## Additional Backends ### Ollama We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download diff --git a/docs/index.md b/docs/index.md index 8569c5cc..0f5022f9 100644 --- a/docs/index.md +++ b/docs/index.md @@ -38,15 +38,15 @@ export OPENAI_API_KEY="your-api-key" ``` ### Vision Agent -There are two agents that you can use. Vision Agent is a conversational agent that has +There are two agents that you can use. `VisionAgent` is a conversational agent that has access to tools that allow it to write an navigate python code and file systems. It can -converse with the user in natural language. VisionAgentCoder is an agent that can write -code for vision tasks, such as counting people in an image. However, it cannot converse -and can only respond with code. VisionAgent can call VisionAgentCoder to write vision -code. +converse with the user in natural language. `VisionAgentCoder` is an agent specifically +for writing code for vision tasks, such as counting people in an image. However, it +cannot chat with you and can only respond with code. `VisionAgent` can call +`VisionAgentCoder` to write vision code. #### Basic Usage -To run the streamlit app locally to chat with Vision Agent, you can run the following +To run the streamlit app locally to chat with `VisionAgent`, you can run the following command: ```bash @@ -143,7 +143,7 @@ the code and having it update. You just need to add the code as a response from assistant: ```python -agent = va.agent.VisionAgent(verbosity=2) +agent = va.agent.VisionAgentCoder(verbosity=2) conv = [ { "role": "user", @@ -209,6 +209,10 @@ function. Make sure the documentation is in the same format above with descripti `Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case [here](examples/custom_tools/) as this is what the agent uses to pick and use the tool. +Can't find the tool you need and want add it to `VisionAgent`? Check out our +[vision-agent-tools](https://github.com/landing-ai/vision-agent-tools) repository where +we add the source code for all the tools used in `VisionAgent`. + ## Additional Backends ### Ollama We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download @@ -230,6 +234,7 @@ tools. You can use it just like you would use `VisionAgentCoder`: >>> agent = va.agent.OllamaVisionAgentCoder() >>> agent("Count the apples in the image", media="apples.jpg") ``` +> WARNING: VisionAgent doesn't work well unless the underlying LMM is sufficiently powerful. Do not expect good results or even working code with smaller models like Llama 3.1 8B. ### Azure OpenAI We also provide a `AzureVisionAgentCoder` that uses Azure OpenAI models. To get started @@ -241,7 +246,7 @@ follow the Azure Setup section below. You can use it just like you would use= >>> agent = va.agent.AzureVisionAgentCoder() >>> agent("Count the apples in the image", media="apples.jpg") ``` -> WARNING: VisionAgent doesn't work well unless the underlying LMM is sufficiently powerful. Do not expect good results or even working code with smaller models like Llama 3.1 8B. + ### Azure Setup If you want to use Azure OpenAI models, you need to have two OpenAI model deployments: diff --git a/vision_agent/utils/image_utils.py b/vision_agent/utils/image_utils.py index d2bc8a6d..54688f93 100644 --- a/vision_agent/utils/image_utils.py +++ b/vision_agent/utils/image_utils.py @@ -70,7 +70,7 @@ def rle_decode_array(rle: Dict[str, List[int]]) -> np.ndarray: r"""Decode a run-length encoded mask. Returns numpy array, 1 - mask, 0 - background. Parameters: - mask: The mask in run-length encoded as an array. + rle: The run-length encoded mask. """ size = rle["size"] counts = rle["counts"]