From 2c9c5c5dd8d044a033bdd51c074670300b6159f6 Mon Sep 17 00:00:00 2001
From: Dillon Laird <dillonalaird@gmail.com>
Date: Mon, 23 Sep 2024 10:49:06 -0700
Subject: [PATCH] updated readme

---
 README.md     | 70 +++++++++++++++++++++++++++++++++++++++++++--------
 docs/index.md | 70 +++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 118 insertions(+), 22 deletions(-)

diff --git a/README.md b/README.md
index 88c59973..1529e354 100644
--- a/README.md
+++ b/README.md
@@ -33,10 +33,11 @@ To get started, you can install the library using pip:
 pip install vision-agent
 ```
 
-Ensure you have an OpenAI API key and set it as an environment variable (if you are
-using Azure OpenAI please see the Azure setup section):
+Ensure you have an Anthropic key and an OpenAI API key and set in your environment
+variables (if you are using Azure OpenAI please see the Azure setup section):
 
 ```bash
+export ANTHROPIC_API_KEY="your-api-key"
 export OPENAI_API_KEY="your-api-key"
 ```
 
@@ -71,6 +72,9 @@ You can find more details about the streamlit app [here](examples/chat/).
 >>> resp = agent(resp)
 ```
 
+`VisionAgent` currently utilizes Claude-3.5 as it's default LMM and uses OpenAI for
+embeddings for tool searching.
+
 ### Vision Agent Coder
 #### Basic Usage
 You can interact with the agent as you would with any LLM or LMM model:
@@ -132,7 +136,8 @@ of the input is a list of dictionaries with the keys `role`, `content`, and `med
     "code": "from vision_agent.tools import ..."
     "test": "calculate_filled_percentage('jar.jpg')",
     "test_result": "...",
-    "plan": [{"code": "...", "test": "...", "plan": "..."}, ...],
+    "plans": {"plan1": {"thoughts": "..."}, ...},
+    "plan_thoughts": "...",
     "working_memory": ...,
 }
 ```
@@ -169,20 +174,25 @@ result = agent.chat_with_workflow(conv)
 ### Tools
 There are a variety of tools for the model or the user to use. Some are executed locally
 while others are hosted for you. You can easily access them yourself, for example if
-you want to run `owl_v2` and visualize the output you can run:
+you want to run `owl_v2_image` and visualize the output you can run:
 
 ```python
 import vision_agent.tools as T
 import matplotlib.pyplot as plt
 
 image = T.load_image("dogs.jpg")
-dets = T.owl_v2("dogs", image)
+dets = T.owl_v2_image("dogs", image)
 viz = T.overlay_bounding_boxes(image, dets)
 plt.imshow(viz)
 plt.show()
 ```
 
-You can also add custom tools to the agent:
+You can find all available tools in `vision_agent/tools/tools.py`, however,
+`VisionAgentCoder` only utilizes a subset of tools that have been tested and provide
+the best performance. Those can be found in the same file under the `TOOLS` variable.
+
+If you can't find the tool you are looking for you can also add custom tools to the
+agent:
 
 ```python
 import vision_agent as va
@@ -217,9 +227,48 @@ Can't find the tool you need and want add it to `VisionAgent`? Check out our
 we add the source code for all the tools used in `VisionAgent`.
 
 ## Additional Backends
+### Anthropic
+`AnthropicVisionAgentCoder` uses Anthropic. To get started you just need to get an
+Anthropic API key and set it in your environment variables:
+
+```bash
+export ANTHROPIC_API_KEY="your-api-key"
+```
+
+Because Anthropic does not support embedding models, the default embedding model used
+is the OpenAI model so you will also need to set your OpenAI API key:
+
+```bash
+export OPEN_AI_API_KEY="your-api-key"
+```
+
+Usage is the same as `VisionAgentCoder`:
+
+```python
+>>> import vision_agent as va
+>>> agent = va.agent.AnthropicVisionAgentCoder()
+>>> agent("Count the apples in the image", media="apples.jpg")
+```
+
+### OpenAI
+`OpenAIVisionAgentCoder` uses OpenAI. To get started you just need to get an OpenAI API
+key and set it in your environment variables:
+
+```bash
+export OPEN_AI_API_KEY="your-api-key"
+```
+
+Usage is the same as `VisionAgentCoder`:
+
+```python
+>>> import vision_agent as va
+>>> agent = va.agent.OpenAIVisionAgentCoder()
+>>> agent("Count the apples in the image", media="apples.jpg")
+```
+
+
 ### Ollama
-We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download
-a few models:
+`OllamaVisionAgentCoder` uses Ollama. To get started you must download a few models:
 
 ```bash
 ollama pull llama3.1
@@ -240,9 +289,8 @@ tools. You can use it just like you would use `VisionAgentCoder`:
 > WARNING: VisionAgent doesn't work well unless the underlying LMM is sufficiently powerful. Do not expect good results or even working code with smaller models like Llama 3.1 8B.
 
 ### Azure OpenAI
-We also provide a `AzureVisionAgentCoder` that uses Azure OpenAI models. To get started
-follow the Azure Setup section below. You can use it just like you would use=
-`VisionAgentCoder`:
+`AzureVisionAgentCoder` uses Azure OpenAI models. To get started follow the Azure Setup
+section below. You can use it just like you would use `VisionAgentCoder`:
 
 ```python
 >>> import vision_agent as va
diff --git a/docs/index.md b/docs/index.md
index 0f5022f9..a83e343e 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -30,10 +30,11 @@ To get started, you can install the library using pip:
 pip install vision-agent
 ```
 
-Ensure you have an OpenAI API key and set it as an environment variable (if you are
-using Azure OpenAI please see the Azure setup section):
+Ensure you have an Anthropic key and an OpenAI API key and set in your environment
+variables (if you are using Azure OpenAI please see the Azure setup section):
 
 ```bash
+export ANTHROPIC_API_KEY="your-api-key"
 export OPENAI_API_KEY="your-api-key"
 ```
 
@@ -68,6 +69,9 @@ You can find more details about the streamlit app [here](examples/chat/).
 >>> resp = agent(resp)
 ```
 
+`VisionAgent` currently utilizes Claude-3.5 as it's default LMM and uses OpenAI for
+embeddings for tool searching.
+
 ### Vision Agent Coder
 #### Basic Usage
 You can interact with the agent as you would with any LLM or LMM model:
@@ -129,7 +133,8 @@ of the input is a list of dictionaries with the keys `role`, `content`, and `med
     "code": "from vision_agent.tools import ..."
     "test": "calculate_filled_percentage('jar.jpg')",
     "test_result": "...",
-    "plan": [{"code": "...", "test": "...", "plan": "..."}, ...],
+    "plans": {"plan1": {"thoughts": "..."}, ...},
+    "plan_thoughts": "...",
     "working_memory": ...,
 }
 ```
@@ -166,20 +171,25 @@ result = agent.chat_with_workflow(conv)
 ### Tools
 There are a variety of tools for the model or the user to use. Some are executed locally
 while others are hosted for you. You can easily access them yourself, for example if
-you want to run `owl_v2` and visualize the output you can run:
+you want to run `owl_v2_image` and visualize the output you can run:
 
 ```python
 import vision_agent.tools as T
 import matplotlib.pyplot as plt
 
 image = T.load_image("dogs.jpg")
-dets = T.owl_v2("dogs", image)
+dets = T.owl_v2_image("dogs", image)
 viz = T.overlay_bounding_boxes(image, dets)
 plt.imshow(viz)
 plt.show()
 ```
 
-You can also add custom tools to the agent:
+You can find all available tools in `vision_agent/tools/tools.py`, however,
+`VisionAgentCoder` only utilizes a subset of tools that have been tested and provide
+the best performance. Those can be found in the same file under the `TOOLS` variable.
+
+If you can't find the tool you are looking for you can also add custom tools to the
+agent:
 
 ```python
 import vision_agent as va
@@ -214,9 +224,48 @@ Can't find the tool you need and want add it to `VisionAgent`? Check out our
 we add the source code for all the tools used in `VisionAgent`.
 
 ## Additional Backends
+### Anthropic
+`AnthropicVisionAgentCoder` uses Anthropic. To get started you just need to get an
+Anthropic API key and set it in your environment variables:
+
+```bash
+export ANTHROPIC_API_KEY="your-api-key"
+```
+
+Because Anthropic does not support embedding models, the default embedding model used
+is the OpenAI model so you will also need to set your OpenAI API key:
+
+```bash
+export OPEN_AI_API_KEY="your-api-key"
+```
+
+Usage is the same as `VisionAgentCoder`:
+
+```python
+>>> import vision_agent as va
+>>> agent = va.agent.AnthropicVisionAgentCoder()
+>>> agent("Count the apples in the image", media="apples.jpg")
+```
+
+### OpenAI
+`OpenAIVisionAgentCoder` uses OpenAI. To get started you just need to get an OpenAI API
+key and set it in your environment variables:
+
+```bash
+export OPEN_AI_API_KEY="your-api-key"
+```
+
+Usage is the same as `VisionAgentCoder`:
+
+```python
+>>> import vision_agent as va
+>>> agent = va.agent.OpenAIVisionAgentCoder()
+>>> agent("Count the apples in the image", media="apples.jpg")
+```
+
+
 ### Ollama
-We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download
-a few models:
+`OllamaVisionAgentCoder` uses Ollama. To get started you must download a few models:
 
 ```bash
 ollama pull llama3.1
@@ -237,9 +286,8 @@ tools. You can use it just like you would use `VisionAgentCoder`:
 > WARNING: VisionAgent doesn't work well unless the underlying LMM is sufficiently powerful. Do not expect good results or even working code with smaller models like Llama 3.1 8B.
 
 ### Azure OpenAI
-We also provide a `AzureVisionAgentCoder` that uses Azure OpenAI models. To get started
-follow the Azure Setup section below. You can use it just like you would use=
-`VisionAgentCoder`:
+`AzureVisionAgentCoder` uses Azure OpenAI models. To get started follow the Azure Setup
+section below. You can use it just like you would use `VisionAgentCoder`:
 
 ```python
 >>> import vision_agent as va