Add Conversation Agent (#171)

* separated utils into file * add orchestrator * add swe-agent tools * updated orchestrator, moved tools * strip extra char from error msg * changed orchestrator to vision agent and vision agent to vision agent coder * changed orch tools to meta toosl * removed old files * fixed zmq cleanup warning * vision agent uses code interpreter * added more tools * added more examples, fixed chat * added eof text * added directory info * code exec needs to keep state * format fix * need to start kernel on init * logging fix, send traceback * add example chat app * fix type errors * mypy, flake8 fixes * fix type issue' * updated docs; * added tool description func * fix retries on planning and logging * multi plan * don't test multi plan on edit code * fix flake8 * added zmq logging * flake8 * mypy * updated readme * added citation * add stylizing * fixed plan testing prompt * fix names of tabs * better formatting for obs * add image viewing * add log_progress * spelling mistakes * updated readme * fixed docs
landing-ai · Jul 29, 2024 · 530ba3b · 530ba3b
1 parent 1b32e94
commit 530ba3b
Show file tree

Hide file tree

Showing 25 changed files with 2,380 additions and 1,294 deletions.
diff --git a/CITATION.cff b/CITATION.cff
@@ -0,0 +1,15 @@
+cff-version: 1.2.0
+message: "If you use this software, please cite it as below."
+authors:
+- family-names: "Laird"
+ given-names: "Dillon"
+- family-names: "Jagadeesan"
+ given-name: "Shankar"
+- family-name: "Cao"
+ given-name: "Yazhou"
+- family-name: "Ng"
+ given-name: "Andrew"
+title: "Vision Agent"
+version: 0.2
+date-released: 2024-02-12
+url: "https://github.com/landing-ai/vision-agent"
diff --git a/README.md b/README.md
@@ -18,7 +18,7 @@ code to solve the task for them. Check out our discord for updates and roadmaps!
 
 ## Web Application
 
-Try Vision Agent live on [va.landing.ai](https://va.landing.ai/)
+Try Vision Agent live on (note this may not be running the most up-to-date version) [va.landing.ai](https://va.landing.ai/)
 
 ## Documentation
 
@@ -40,16 +40,44 @@ using Azure OpenAI please see the Azure setup section):
 export OPENAI_API_KEY="your-api-key"
 ```
 
-### Important Note on API Usage
-Please be aware that using the API in this project requires you to have API credits (minimum of five US dollars). This is different from the OpenAI subscription used in this chatbot. If you don't have credit, further information can be found [here](https://github.com/landing-ai/vision-agent?tab=readme-ov-file#how-to-get-started-with-openai-api-credits)
-
 ### Vision Agent
+There are two agents that you can use. Vision Agent is a conversational agent that has
+access to tools that allow it to write an navigate python code and file systems. It can
+converse with the user in natural language. VisionAgentCoder is an agent that can write
+code for vision tasks, such as counting people in an image. However, it cannot converse
+and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
+code.
+
 #### Basic Usage
-You can interact with the agent as you would with any LLM or LMM model:
+To run the streamlit app locally to chat with Vision Agent, you can run the following
+command:
+
+```bash
+pip install -r examples/chat/requirements.txt
+export WORKSPACE=/path/to/your/workspace
+export ZMQ_PORT=5555
+streamlit run examples/chat/app.py
+```
+You can find more details about the streamlit app [here](examples/chat/).
 
+#### Basic Programmatic Usage
 ```python
 >>> from vision_agent.agent import VisionAgent
 >>> agent = VisionAgent()
+>>> resp = agent("Hello")
+>>> print(resp)
+[{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "{'thoughts': 'The user has greeted me. I will respond with a greeting and ask how I can assist them.', 'response': 'Hello! How can I assist you today?', 'let_user_respond': True}"}]
+>>> resp.append({"role": "user", "content": "Can you count the number of people in this image?", "media": ["people.jpg"]})
+>>> resp = agent(resp)
+```
+
+### Vision Agent Coder
+#### Basic Usage
+You can interact with the agent as you would with any LLM or LMM model:
+
+```python
+>>> from vision_agent.agent import VisionAgentCoder
+>>> agent = VisionAgentCoder()
 >>> code = agent("What percentage of the area of the jar is filled with coffee beans?", media="jar.jpg")
 ```
 
@@ -90,7 +118,7 @@ To better understand how the model came up with it's answer, you can run it in d
 mode by passing in the verbose argument:
 
 ```python
->>> agent = VisionAgent(verbose=2)
+>>> agent = VisionAgentCoder(verbose=2)
 ```
 
 #### Detailed Usage
@@ -180,9 +208,11 @@ def custom_tool(image_path: str) -> str:
  return np.zeros((10, 10))
 ```
 
-You need to ensure you call `@va.tools.register_tool` with any imports it might use and
-ensure the documentation is in the same format above with description, `Parameters:`,
-`Returns:`, and `Example\n-------`. You can find an example use case [here](examples/custom_tools/).
+You need to ensure you call `@va.tools.register_tool` with any imports it uses. Global
+variables will not be captured by `register_tool` so you need to include them in the
+function. Make sure the documentation is in the same format above with description,
+`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
+[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.
 
 ### Azure Setup
 If you want to use Azure OpenAI models, you need to have two OpenAI model deployments:
@@ -209,7 +239,7 @@ You can then run Vision Agent using the Azure OpenAI models:
 
 ```python
 import vision_agent as va
-agent = va.agent.AzureVisionAgent()
+agent = va.agent.AzureVisionAgentCoder()
 ```
 
 ******************************************************************************************************************************
@@ -218,7 +248,7 @@ agent = va.agent.AzureVisionAgent()
 
 #### How to get started with OpenAI API credits
 
-1. Visit the[OpenAI API platform](https://beta.openai.com/signup/) to sign up for an API key.
+1. Visit the [OpenAI API platform](https://beta.openai.com/signup/) to sign up for an API key.
 2. Follow the instructions to purchase and manage your API credits.
 3. Ensure your API key is correctly configured in your project settings.
 

diff --git a/docs/api/agent.md b/docs/api/agent.md
@@ -1,3 +1,7 @@
 ::: vision_agent.agent.agent.Agent
 
 ::: vision_agent.agent.vision_agent.VisionAgent
+
+::: vision_agent.agent.vision_agent_coder.VisionAgentCoder
+
+::: vision_agent.agent.vision_agent_coder.AzureVisionAgentCoder
diff --git a/docs/api/lmm.md b/docs/api/lmm.md
@@ -1,3 +1,7 @@
 ::: vision_agent.lmm.OpenAILMM
 
 ::: vision_agent.lmm.AzureOpenAILMM
+
+::: vision_agent.lmm.OllamaLMM
+
+::: vision_agent.lmm.ClaudeSonnetLMM
diff --git a/docs/index.md b/docs/index.md
@@ -10,7 +10,7 @@ code to solve the task for them. Check out our discord for updates and roadmaps!
 
 ## Web Application
 
-Try Vision Agent live on [va.landing.ai](https://va.landing.ai/)
+Try Vision Agent live on (note this may not be running the most up-to-date version) [va.landing.ai](https://va.landing.ai/)
 
 ## Documentation
 
@@ -32,16 +32,44 @@ using Azure OpenAI please see the Azure setup section):
 export OPENAI_API_KEY="your-api-key"
 ```
 
-### Important Note on API Usage
-Please be aware that using the API in this project requires you to have API credits (minimum of five US dollars). This is different from the OpenAI subscription used in this chatbot. If you don't have credit, further information can be found [here](https://github.com/landing-ai/vision-agent?tab=readme-ov-file#how-to-get-started-with-openai-api-credits)
-
 ### Vision Agent
+There are two agents that you can use. Vision Agent is a conversational agent that has
+access to tools that allow it to write an navigate python code and file systems. It can
+converse with the user in natural language. VisionAgentCoder is an agent that can write
+code for vision tasks, such as counting people in an image. However, it cannot converse
+and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
+code.
+
 #### Basic Usage
-You can interact with the agent as you would with any LLM or LMM model:
+To run the streamlit app locally to chat with Vision Agent, you can run the following
+command:
+
+```bash
+pip install -r examples/chat/requirements.txt
+export WORKSPACE=/path/to/your/workspace
+export ZMQ_PORT=5555
+streamlit run examples/chat/app.py
+```
+You can find more details about the streamlit app [here](examples/chat/).
 
+#### Basic Programmatic Usage
 ```python
 >>> from vision_agent.agent import VisionAgent
 >>> agent = VisionAgent()
+>>> resp = agent("Hello")
+>>> print(resp)
+[{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "{'thoughts': 'The user has greeted me. I will respond with a greeting and ask how I can assist them.', 'response': 'Hello! How can I assist you today?', 'let_user_respond': True}"}]
+>>> resp.append({"role": "user", "content": "Can you count the number of people in this image?", "media": ["people.jpg"]})
+>>> resp = agent(resp)
+```
+
+### Vision Agent Coder
+#### Basic Usage
+You can interact with the agent as you would with any LLM or LMM model:
+
+```python
+>>> from vision_agent.agent import VisionAgentCoder
+>>> agent = VisionAgentCoder()
 >>> code = agent("What percentage of the area of the jar is filled with coffee beans?", media="jar.jpg")
 ```
 
@@ -82,7 +110,7 @@ To better understand how the model came up with it's answer, you can run it in d
 mode by passing in the verbose argument:
 
 ```python
->>> agent = VisionAgent(verbose=2)
+>>> agent = VisionAgentCoder(verbose=2)
 ```
 
 #### Detailed Usage
@@ -172,9 +200,11 @@ def custom_tool(image_path: str) -> str:
  return np.zeros((10, 10))
 ```
 
-You need to ensure you call `@va.tools.register_tool` with any imports it might use and
-ensure the documentation is in the same format above with description, `Parameters:`,
-`Returns:`, and `Example\n-------`. You can find an example use case [here](examples/custom_tools/).
+You need to ensure you call `@va.tools.register_tool` with any imports it uses. Global
+variables will not be captured by `register_tool` so you need to include them in the
+function. Make sure the documentation is in the same format above with description,
+`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
+[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.
 
 ### Azure Setup
 If you want to use Azure OpenAI models, you need to have two OpenAI model deployments:
@@ -201,7 +231,7 @@ You can then run Vision Agent using the Azure OpenAI models:
 
 ```python
 import vision_agent as va
-agent = va.agent.AzureVisionAgent()
+agent = va.agent.AzureVisionAgentCoder()
 ```
 
 ******************************************************************************************************************************
@@ -210,7 +240,7 @@ agent = va.agent.AzureVisionAgent()
 
 #### How to get started with OpenAI API credits
 
-1. Visit the[OpenAI API platform](https://beta.openai.com/signup/) to sign up for an API key.
+1. Visit the [OpenAI API platform](https://beta.openai.com/signup/) to sign up for an API key.
 2. Follow the instructions to purchase and manage your API credits.
 3. Ensure your API key is correctly configured in your project settings.
 

diff --git a/examples/chat/README.md b/examples/chat/README.md
@@ -0,0 +1,51 @@
+# Vision Agent Chat Application
+
+The Vision Agent chat appliction allows you to have conversations with the agent system
+to accomplish a wider variety of tasks.
+
+## Get Started
+To get started first install the requirements by running the following command:
+```bash
+pip install -r requirements.txt
+```
+
+There are two environment variables you must set, the first is `WORKSPACE` which is
+where the agent will look for and write files to:
+```bash
+export WORKSPACE=/path/to/your/workspace
+```
+
+The second is `ZMQ_PORT`, this is how the agent collects logs from subprocesses it runs
+for writing code:
+```bash
+export ZMQ_PORT=5555
+```
+
+Finally you can launch the app with the following command:
+```bash
+streamlit run app.py
+```
+
+You can upload an image to your workspace in the right column first tab, then ask the
+agent to do a task, (be sure to include which image you want it to use for testing) for
+example:
+```
+Can you count the number of people in this image? Use image.jpg for testing.
+```
+
+## Layout
+The are two columns, left and right, each with two tabs.
+
+`Chat` the left column first tab is where you can chat with Vision Agent. It can answer
+your questions and execute python code on your behalf. Note if you ask it to generate
+vision code it may take awhile to run.
+
+`Code Execution Logs` the left column second tab is where you will see intermediate logs
+when Vision Agent is generating vision code. Because code generation can take some
+time, you can monitor this tab to see what the agent is doing.
+
+`File Browser` the right column first tab is where you can see the files in your
+workspace.
+
+`Code Editor` the right column second tab is where you can examine code files the agent
+has written. You can also modify the code and save it in case the code is incorrect.