updated docs

landing-ai · Aug 28, 2024 · dbd55ae · dbd55ae
1 parent 62e8a86
commit dbd55ae
Show file tree

Hide file tree

Showing 3 changed files with 25 additions and 16 deletions.
diff --git a/README.md b/README.md
@@ -41,15 +41,15 @@ export OPENAI_API_KEY="your-api-key"
 ```
 
 ### Vision Agent
-There are two agents that you can use. Vision Agent is a conversational agent that has
+There are two agents that you can use. `VisionAgent` is a conversational agent that has
 access to tools that allow it to write an navigate python code and file systems. It can
-converse with the user in natural language. VisionAgentCoder is an agent that can write
-code for vision tasks, such as counting people in an image. However, it cannot converse
-and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
-code.
+converse with the user in natural language. `VisionAgentCoder` is an agent specifically
+for writing code for vision tasks, such as counting people in an image. However, it
+cannot chat with you and can only respond with code. `VisionAgent` can call
+`VisionAgentCoder` to write vision code.
 
 #### Basic Usage
-To run the streamlit app locally to chat with Vision Agent, you can run the following
+To run the streamlit app locally to chat with `VisionAgent`, you can run the following
 command:
 
 ```bash
@@ -146,7 +146,7 @@ the code and having it update. You just need to add the code as a response from
 assistant:
 
 ```python
-agent = va.agent.VisionAgent(verbosity=2)
+agent = va.agent.VisionAgentCoder(verbosity=2)
 conv = [
  {
  "role": "user",
@@ -212,6 +212,10 @@ function. Make sure the documentation is in the same format above with descripti
 `Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
 [here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.
 
+Can't find the tool you need and want add it to `VisionAgent`? Check out our
+[vision-agent-tools](https://github.com/landing-ai/vision-agent-tools) repository where
+we add the source code for all the tools used in `VisionAgent`.
+
 ## Additional Backends
 ### Ollama
 We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download

diff --git a/docs/index.md b/docs/index.md
@@ -38,15 +38,15 @@ export OPENAI_API_KEY="your-api-key"
 ```
 
 ### Vision Agent
-There are two agents that you can use. Vision Agent is a conversational agent that has
+There are two agents that you can use. `VisionAgent` is a conversational agent that has
 access to tools that allow it to write an navigate python code and file systems. It can
-converse with the user in natural language. VisionAgentCoder is an agent that can write
-code for vision tasks, such as counting people in an image. However, it cannot converse
-and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
-code.
+converse with the user in natural language. `VisionAgentCoder` is an agent specifically
+for writing code for vision tasks, such as counting people in an image. However, it
+cannot chat with you and can only respond with code. `VisionAgent` can call
+`VisionAgentCoder` to write vision code.
 
 #### Basic Usage
-To run the streamlit app locally to chat with Vision Agent, you can run the following
+To run the streamlit app locally to chat with `VisionAgent`, you can run the following
 command:
 
 ```bash
@@ -143,7 +143,7 @@ the code and having it update. You just need to add the code as a response from
 assistant:
 
 ```python
-agent = va.agent.VisionAgent(verbosity=2)
+agent = va.agent.VisionAgentCoder(verbosity=2)
 conv = [
  {
  "role": "user",
@@ -209,6 +209,10 @@ function. Make sure the documentation is in the same format above with descripti
 `Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
 [here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.
 
+Can't find the tool you need and want add it to `VisionAgent`? Check out our
+[vision-agent-tools](https://github.com/landing-ai/vision-agent-tools) repository where
+we add the source code for all the tools used in `VisionAgent`.
+
 ## Additional Backends
 ### Ollama
 We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download
@@ -230,6 +234,7 @@ tools. You can use it just like you would use `VisionAgentCoder`:
 >>> agent = va.agent.OllamaVisionAgentCoder()
 >>> agent("Count the apples in the image", media="apples.jpg")
 ```
+> WARNING: VisionAgent doesn't work well unless the underlying LMM is sufficiently powerful. Do not expect good results or even working code with smaller models like Llama 3.1 8B.
 
 ### Azure OpenAI
 We also provide a `AzureVisionAgentCoder` that uses Azure OpenAI models. To get started
@@ -241,7 +246,7 @@ follow the Azure Setup section below. You can use it just like you would use=
 >>> agent = va.agent.AzureVisionAgentCoder()
 >>> agent("Count the apples in the image", media="apples.jpg")
 ```
-> WARNING: VisionAgent doesn't work well unless the underlying LMM is sufficiently powerful. Do not expect good results or even working code with smaller models like Llama 3.1 8B.
+
 
 ### Azure Setup
 If you want to use Azure OpenAI models, you need to have two OpenAI model deployments:

diff --git a/vision_agent/utils/image_utils.py b/vision_agent/utils/image_utils.py
@@ -70,7 +70,7 @@ def rle_decode_array(rle: Dict[str, List[int]]) -> np.ndarray:
  r"""Decode a run-length encoded mask. Returns numpy array, 1 - mask, 0 - background.
 
  Parameters:
- mask: The mask in run-length encoded as an array.
+ rle: The run-length encoded mask.
  """
  size = rle["size"]
  counts = rle["counts"]