Skip to content

Commit

Permalink
Add Conversation Agent (#171)
Browse files Browse the repository at this point in the history
* separated utils into file

* add orchestrator

* add swe-agent tools

* updated orchestrator, moved tools

* strip extra char from error msg

* changed orchestrator to vision agent and vision agent to vision agent coder

* changed orch tools to meta toosl

* removed old files

* fixed zmq cleanup warning

* vision agent uses code interpreter

* added more tools

* added more examples, fixed chat

* added eof text

* added directory info

* code exec needs to keep state

* format fix

* need to start kernel on init

* logging fix, send traceback

* add example chat app

* fix type errors

* mypy, flake8 fixes

* fix type issue'

* updated docs;

* added tool description func

* fix retries on planning and logging

* multi plan

* don't test multi plan on edit code

* fix flake8

* added zmq logging

* flake8

* mypy

* updated readme

* added citation

* add stylizing

* fixed plan testing prompt

* fix names of tabs

* better formatting for obs

* add image viewing

* add log_progress

* spelling mistakes

* updated readme

* fixed docs
  • Loading branch information
dillonalaird authored Jul 29, 2024
1 parent 1b32e94 commit 530ba3b
Show file tree
Hide file tree
Showing 25 changed files with 2,380 additions and 1,294 deletions.
15 changes: 15 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Laird"
given-names: "Dillon"
- family-names: "Jagadeesan"
given-name: "Shankar"
- family-name: "Cao"
given-name: "Yazhou"
- family-name: "Ng"
given-name: "Andrew"
title: "Vision Agent"
version: 0.2
date-released: 2024-02-12
url: "https://github.com/landing-ai/vision-agent"
52 changes: 41 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ code to solve the task for them. Check out our discord for updates and roadmaps!

## Web Application

Try Vision Agent live on [va.landing.ai](https://va.landing.ai/)
Try Vision Agent live on (note this may not be running the most up-to-date version) [va.landing.ai](https://va.landing.ai/)

## Documentation

Expand All @@ -40,16 +40,44 @@ using Azure OpenAI please see the Azure setup section):
export OPENAI_API_KEY="your-api-key"
```

### Important Note on API Usage
Please be aware that using the API in this project requires you to have API credits (minimum of five US dollars). This is different from the OpenAI subscription used in this chatbot. If you don't have credit, further information can be found [here](https://github.com/landing-ai/vision-agent?tab=readme-ov-file#how-to-get-started-with-openai-api-credits)

### Vision Agent
There are two agents that you can use. Vision Agent is a conversational agent that has
access to tools that allow it to write an navigate python code and file systems. It can
converse with the user in natural language. VisionAgentCoder is an agent that can write
code for vision tasks, such as counting people in an image. However, it cannot converse
and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
code.

#### Basic Usage
You can interact with the agent as you would with any LLM or LMM model:
To run the streamlit app locally to chat with Vision Agent, you can run the following
command:

```bash
pip install -r examples/chat/requirements.txt
export WORKSPACE=/path/to/your/workspace
export ZMQ_PORT=5555
streamlit run examples/chat/app.py
```
You can find more details about the streamlit app [here](examples/chat/).

#### Basic Programmatic Usage
```python
>>> from vision_agent.agent import VisionAgent
>>> agent = VisionAgent()
>>> resp = agent("Hello")
>>> print(resp)
[{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "{'thoughts': 'The user has greeted me. I will respond with a greeting and ask how I can assist them.', 'response': 'Hello! How can I assist you today?', 'let_user_respond': True}"}]
>>> resp.append({"role": "user", "content": "Can you count the number of people in this image?", "media": ["people.jpg"]})
>>> resp = agent(resp)
```

### Vision Agent Coder
#### Basic Usage
You can interact with the agent as you would with any LLM or LMM model:

```python
>>> from vision_agent.agent import VisionAgentCoder
>>> agent = VisionAgentCoder()
>>> code = agent("What percentage of the area of the jar is filled with coffee beans?", media="jar.jpg")
```

Expand Down Expand Up @@ -90,7 +118,7 @@ To better understand how the model came up with it's answer, you can run it in d
mode by passing in the verbose argument:

```python
>>> agent = VisionAgent(verbose=2)
>>> agent = VisionAgentCoder(verbose=2)
```

#### Detailed Usage
Expand Down Expand Up @@ -180,9 +208,11 @@ def custom_tool(image_path: str) -> str:
return np.zeros((10, 10))
```

You need to ensure you call `@va.tools.register_tool` with any imports it might use and
ensure the documentation is in the same format above with description, `Parameters:`,
`Returns:`, and `Example\n-------`. You can find an example use case [here](examples/custom_tools/).
You need to ensure you call `@va.tools.register_tool` with any imports it uses. Global
variables will not be captured by `register_tool` so you need to include them in the
function. Make sure the documentation is in the same format above with description,
`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.

### Azure Setup
If you want to use Azure OpenAI models, you need to have two OpenAI model deployments:
Expand All @@ -209,7 +239,7 @@ You can then run Vision Agent using the Azure OpenAI models:

```python
import vision_agent as va
agent = va.agent.AzureVisionAgent()
agent = va.agent.AzureVisionAgentCoder()
```

******************************************************************************************************************************
Expand All @@ -218,7 +248,7 @@ agent = va.agent.AzureVisionAgent()

#### How to get started with OpenAI API credits

1. Visit the[OpenAI API platform](https://beta.openai.com/signup/) to sign up for an API key.
1. Visit the [OpenAI API platform](https://beta.openai.com/signup/) to sign up for an API key.
2. Follow the instructions to purchase and manage your API credits.
3. Ensure your API key is correctly configured in your project settings.

Expand Down
4 changes: 4 additions & 0 deletions docs/api/agent.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
::: vision_agent.agent.agent.Agent

::: vision_agent.agent.vision_agent.VisionAgent

::: vision_agent.agent.vision_agent_coder.VisionAgentCoder

::: vision_agent.agent.vision_agent_coder.AzureVisionAgentCoder
4 changes: 4 additions & 0 deletions docs/api/lmm.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
::: vision_agent.lmm.OpenAILMM

::: vision_agent.lmm.AzureOpenAILMM

::: vision_agent.lmm.OllamaLMM

::: vision_agent.lmm.ClaudeSonnetLMM
52 changes: 41 additions & 11 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ code to solve the task for them. Check out our discord for updates and roadmaps!

## Web Application

Try Vision Agent live on [va.landing.ai](https://va.landing.ai/)
Try Vision Agent live on (note this may not be running the most up-to-date version) [va.landing.ai](https://va.landing.ai/)

## Documentation

Expand All @@ -32,16 +32,44 @@ using Azure OpenAI please see the Azure setup section):
export OPENAI_API_KEY="your-api-key"
```

### Important Note on API Usage
Please be aware that using the API in this project requires you to have API credits (minimum of five US dollars). This is different from the OpenAI subscription used in this chatbot. If you don't have credit, further information can be found [here](https://github.com/landing-ai/vision-agent?tab=readme-ov-file#how-to-get-started-with-openai-api-credits)

### Vision Agent
There are two agents that you can use. Vision Agent is a conversational agent that has
access to tools that allow it to write an navigate python code and file systems. It can
converse with the user in natural language. VisionAgentCoder is an agent that can write
code for vision tasks, such as counting people in an image. However, it cannot converse
and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
code.

#### Basic Usage
You can interact with the agent as you would with any LLM or LMM model:
To run the streamlit app locally to chat with Vision Agent, you can run the following
command:

```bash
pip install -r examples/chat/requirements.txt
export WORKSPACE=/path/to/your/workspace
export ZMQ_PORT=5555
streamlit run examples/chat/app.py
```
You can find more details about the streamlit app [here](examples/chat/).

#### Basic Programmatic Usage
```python
>>> from vision_agent.agent import VisionAgent
>>> agent = VisionAgent()
>>> resp = agent("Hello")
>>> print(resp)
[{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "{'thoughts': 'The user has greeted me. I will respond with a greeting and ask how I can assist them.', 'response': 'Hello! How can I assist you today?', 'let_user_respond': True}"}]
>>> resp.append({"role": "user", "content": "Can you count the number of people in this image?", "media": ["people.jpg"]})
>>> resp = agent(resp)
```

### Vision Agent Coder
#### Basic Usage
You can interact with the agent as you would with any LLM or LMM model:

```python
>>> from vision_agent.agent import VisionAgentCoder
>>> agent = VisionAgentCoder()
>>> code = agent("What percentage of the area of the jar is filled with coffee beans?", media="jar.jpg")
```

Expand Down Expand Up @@ -82,7 +110,7 @@ To better understand how the model came up with it's answer, you can run it in d
mode by passing in the verbose argument:

```python
>>> agent = VisionAgent(verbose=2)
>>> agent = VisionAgentCoder(verbose=2)
```

#### Detailed Usage
Expand Down Expand Up @@ -172,9 +200,11 @@ def custom_tool(image_path: str) -> str:
return np.zeros((10, 10))
```

You need to ensure you call `@va.tools.register_tool` with any imports it might use and
ensure the documentation is in the same format above with description, `Parameters:`,
`Returns:`, and `Example\n-------`. You can find an example use case [here](examples/custom_tools/).
You need to ensure you call `@va.tools.register_tool` with any imports it uses. Global
variables will not be captured by `register_tool` so you need to include them in the
function. Make sure the documentation is in the same format above with description,
`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.

### Azure Setup
If you want to use Azure OpenAI models, you need to have two OpenAI model deployments:
Expand All @@ -201,7 +231,7 @@ You can then run Vision Agent using the Azure OpenAI models:

```python
import vision_agent as va
agent = va.agent.AzureVisionAgent()
agent = va.agent.AzureVisionAgentCoder()
```

******************************************************************************************************************************
Expand All @@ -210,7 +240,7 @@ agent = va.agent.AzureVisionAgent()

#### How to get started with OpenAI API credits

1. Visit the[OpenAI API platform](https://beta.openai.com/signup/) to sign up for an API key.
1. Visit the [OpenAI API platform](https://beta.openai.com/signup/) to sign up for an API key.
2. Follow the instructions to purchase and manage your API credits.
3. Ensure your API key is correctly configured in your project settings.

Expand Down
51 changes: 51 additions & 0 deletions examples/chat/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Vision Agent Chat Application

The Vision Agent chat appliction allows you to have conversations with the agent system
to accomplish a wider variety of tasks.

## Get Started
To get started first install the requirements by running the following command:
```bash
pip install -r requirements.txt
```

There are two environment variables you must set, the first is `WORKSPACE` which is
where the agent will look for and write files to:
```bash
export WORKSPACE=/path/to/your/workspace
```

The second is `ZMQ_PORT`, this is how the agent collects logs from subprocesses it runs
for writing code:
```bash
export ZMQ_PORT=5555
```

Finally you can launch the app with the following command:
```bash
streamlit run app.py
```

You can upload an image to your workspace in the right column first tab, then ask the
agent to do a task, (be sure to include which image you want it to use for testing) for
example:
```
Can you count the number of people in this image? Use image.jpg for testing.
```

## Layout
The are two columns, left and right, each with two tabs.

`Chat` the left column first tab is where you can chat with Vision Agent. It can answer
your questions and execute python code on your behalf. Note if you ask it to generate
vision code it may take awhile to run.

`Code Execution Logs` the left column second tab is where you will see intermediate logs
when Vision Agent is generating vision code. Because code generation can take some
time, you can monitor this tab to see what the agent is doing.

`File Browser` the right column first tab is where you can see the files in your
workspace.

`Code Editor` the right column second tab is where you can examine code files the agent
has written. You can also modify the code and save it in case the code is incorrect.
Loading

0 comments on commit 530ba3b

Please sign in to comment.