Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Conversation Agent #171

Merged
merged 42 commits into from
Jul 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
b4e2c32
separated utils into file
dillonalaird Jun 27, 2024
4df44f9
add orchestrator
dillonalaird Jun 27, 2024
cda4309
add swe-agent tools
dillonalaird Jun 27, 2024
1abba17
updated orchestrator, moved tools
dillonalaird Jun 27, 2024
bdbb7d2
strip extra char from error msg
dillonalaird Jul 2, 2024
d7f44ca
changed orchestrator to vision agent and vision agent to vision agent…
dillonalaird Jul 11, 2024
dbcbd01
changed orch tools to meta toosl
dillonalaird Jul 11, 2024
2962aba
removed old files
dillonalaird Jul 11, 2024
8bae420
fixed zmq cleanup warning
dillonalaird Jul 11, 2024
e734351
vision agent uses code interpreter
dillonalaird Jul 11, 2024
7697ef3
added more tools
dillonalaird Jul 12, 2024
7c1ec4e
added more examples, fixed chat
dillonalaird Jul 14, 2024
7d30cc9
added eof text
dillonalaird Jul 14, 2024
761f89d
added directory info
dillonalaird Jul 14, 2024
2dc1fa7
code exec needs to keep state
dillonalaird Jul 14, 2024
e70ebd8
format fix
dillonalaird Jul 14, 2024
2466975
need to start kernel on init
dillonalaird Jul 15, 2024
6e00bd0
logging fix, send traceback
dillonalaird Jul 15, 2024
bd3051b
add example chat app
dillonalaird Jul 15, 2024
e5a2c11
fix type errors
dillonalaird Jul 16, 2024
60c672b
mypy, flake8 fixes
dillonalaird Jul 16, 2024
d3716a9
fix type issue'
dillonalaird Jul 16, 2024
4ad0ed0
updated docs;
dillonalaird Jul 16, 2024
b14fa4e
added tool description func
dillonalaird Jul 17, 2024
99c5fe0
fix retries on planning and logging
dillonalaird Jul 17, 2024
915f635
multi plan
dillonalaird Jul 19, 2024
c3db097
don't test multi plan on edit code
dillonalaird Jul 19, 2024
7fe2949
fix flake8
dillonalaird Jul 23, 2024
9d96ee4
added zmq logging
dillonalaird Jul 24, 2024
05e1aec
flake8
dillonalaird Jul 24, 2024
661a948
mypy
dillonalaird Jul 24, 2024
af2cc88
updated readme
dillonalaird Jul 24, 2024
2977ff4
added citation
dillonalaird Jul 24, 2024
1a4eec9
add stylizing
dillonalaird Jul 25, 2024
75dab2d
fixed plan testing prompt
dillonalaird Jul 25, 2024
1aaa6d0
fix names of tabs
dillonalaird Jul 26, 2024
8466234
better formatting for obs
dillonalaird Jul 26, 2024
53a127f
add image viewing
dillonalaird Jul 26, 2024
38bf48e
add log_progress
dillonalaird Jul 27, 2024
e1c8f92
spelling mistakes
dillonalaird Jul 27, 2024
3136d32
updated readme
dillonalaird Jul 27, 2024
406f55f
fixed docs
dillonalaird Jul 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Laird"
given-names: "Dillon"
- family-names: "Jagadeesan"
given-name: "Shankar"
- family-name: "Cao"
given-name: "Yazhou"
- family-name: "Ng"
given-name: "Andrew"
title: "Vision Agent"
version: 0.2
date-released: 2024-02-12
url: "https://github.com/landing-ai/vision-agent"
52 changes: 41 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ code to solve the task for them. Check out our discord for updates and roadmaps!

## Web Application

Try Vision Agent live on [va.landing.ai](https://va.landing.ai/)
Try Vision Agent live on (note this may not be running the most up-to-date version) [va.landing.ai](https://va.landing.ai/)

## Documentation

Expand All @@ -40,16 +40,44 @@ using Azure OpenAI please see the Azure setup section):
export OPENAI_API_KEY="your-api-key"
```

### Important Note on API Usage
Please be aware that using the API in this project requires you to have API credits (minimum of five US dollars). This is different from the OpenAI subscription used in this chatbot. If you don't have credit, further information can be found [here](https://github.com/landing-ai/vision-agent?tab=readme-ov-file#how-to-get-started-with-openai-api-credits)

### Vision Agent
There are two agents that you can use. Vision Agent is a conversational agent that has
access to tools that allow it to write an navigate python code and file systems. It can
converse with the user in natural language. VisionAgentCoder is an agent that can write
code for vision tasks, such as counting people in an image. However, it cannot converse
and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
code.

shankar-vision-eng marked this conversation as resolved.
Show resolved Hide resolved
#### Basic Usage
You can interact with the agent as you would with any LLM or LMM model:
To run the streamlit app locally to chat with Vision Agent, you can run the following
command:

```bash
pip install -r examples/chat/requirements.txt
export WORKSPACE=/path/to/your/workspace
export ZMQ_PORT=5555
streamlit run examples/chat/app.py
```
You can find more details about the streamlit app [here](examples/chat/).

#### Basic Programmatic Usage
```python
>>> from vision_agent.agent import VisionAgent
>>> agent = VisionAgent()
>>> resp = agent("Hello")
>>> print(resp)
[{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "{'thoughts': 'The user has greeted me. I will respond with a greeting and ask how I can assist them.', 'response': 'Hello! How can I assist you today?', 'let_user_respond': True}"}]
>>> resp.append({"role": "user", "content": "Can you count the number of people in this image?", "media": ["people.jpg"]})
>>> resp = agent(resp)
```

### Vision Agent Coder
#### Basic Usage
You can interact with the agent as you would with any LLM or LMM model:

```python
>>> from vision_agent.agent import VisionAgentCoder
>>> agent = VisionAgentCoder()
>>> code = agent("What percentage of the area of the jar is filled with coffee beans?", media="jar.jpg")
```

Expand Down Expand Up @@ -90,7 +118,7 @@ To better understand how the model came up with it's answer, you can run it in d
mode by passing in the verbose argument:

```python
>>> agent = VisionAgent(verbose=2)
>>> agent = VisionAgentCoder(verbose=2)
```

#### Detailed Usage
Expand Down Expand Up @@ -180,9 +208,11 @@ def custom_tool(image_path: str) -> str:
return np.zeros((10, 10))
```

You need to ensure you call `@va.tools.register_tool` with any imports it might use and
ensure the documentation is in the same format above with description, `Parameters:`,
`Returns:`, and `Example\n-------`. You can find an example use case [here](examples/custom_tools/).
You need to ensure you call `@va.tools.register_tool` with any imports it uses. Global
variables will not be captured by `register_tool` so you need to include them in the
function. Make sure the documentation is in the same format above with description,
`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.

### Azure Setup
If you want to use Azure OpenAI models, you need to have two OpenAI model deployments:
Expand All @@ -209,7 +239,7 @@ You can then run Vision Agent using the Azure OpenAI models:

```python
import vision_agent as va
agent = va.agent.AzureVisionAgent()
agent = va.agent.AzureVisionAgentCoder()
```

******************************************************************************************************************************
Expand All @@ -218,7 +248,7 @@ agent = va.agent.AzureVisionAgent()

#### How to get started with OpenAI API credits

1. Visit the[OpenAI API platform](https://beta.openai.com/signup/) to sign up for an API key.
1. Visit the [OpenAI API platform](https://beta.openai.com/signup/) to sign up for an API key.
2. Follow the instructions to purchase and manage your API credits.
3. Ensure your API key is correctly configured in your project settings.

Expand Down
4 changes: 4 additions & 0 deletions docs/api/agent.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
::: vision_agent.agent.agent.Agent

::: vision_agent.agent.vision_agent.VisionAgent

::: vision_agent.agent.vision_agent_coder.VisionAgentCoder

::: vision_agent.agent.vision_agent_coder.AzureVisionAgentCoder
4 changes: 4 additions & 0 deletions docs/api/lmm.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
::: vision_agent.lmm.OpenAILMM

::: vision_agent.lmm.AzureOpenAILMM

::: vision_agent.lmm.OllamaLMM

::: vision_agent.lmm.ClaudeSonnetLMM
52 changes: 41 additions & 11 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ code to solve the task for them. Check out our discord for updates and roadmaps!

## Web Application

Try Vision Agent live on [va.landing.ai](https://va.landing.ai/)
Try Vision Agent live on (note this may not be running the most up-to-date version) [va.landing.ai](https://va.landing.ai/)

## Documentation

Expand All @@ -32,16 +32,44 @@ using Azure OpenAI please see the Azure setup section):
export OPENAI_API_KEY="your-api-key"
```

### Important Note on API Usage
Please be aware that using the API in this project requires you to have API credits (minimum of five US dollars). This is different from the OpenAI subscription used in this chatbot. If you don't have credit, further information can be found [here](https://github.com/landing-ai/vision-agent?tab=readme-ov-file#how-to-get-started-with-openai-api-credits)

### Vision Agent
There are two agents that you can use. Vision Agent is a conversational agent that has
access to tools that allow it to write an navigate python code and file systems. It can
converse with the user in natural language. VisionAgentCoder is an agent that can write
code for vision tasks, such as counting people in an image. However, it cannot converse
and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
code.

#### Basic Usage
You can interact with the agent as you would with any LLM or LMM model:
To run the streamlit app locally to chat with Vision Agent, you can run the following
command:

```bash
pip install -r examples/chat/requirements.txt
export WORKSPACE=/path/to/your/workspace
export ZMQ_PORT=5555
streamlit run examples/chat/app.py
```
You can find more details about the streamlit app [here](examples/chat/).

#### Basic Programmatic Usage
```python
>>> from vision_agent.agent import VisionAgent
>>> agent = VisionAgent()
>>> resp = agent("Hello")
>>> print(resp)
[{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "{'thoughts': 'The user has greeted me. I will respond with a greeting and ask how I can assist them.', 'response': 'Hello! How can I assist you today?', 'let_user_respond': True}"}]
>>> resp.append({"role": "user", "content": "Can you count the number of people in this image?", "media": ["people.jpg"]})
>>> resp = agent(resp)
```

### Vision Agent Coder
#### Basic Usage
You can interact with the agent as you would with any LLM or LMM model:

```python
>>> from vision_agent.agent import VisionAgentCoder
>>> agent = VisionAgentCoder()
>>> code = agent("What percentage of the area of the jar is filled with coffee beans?", media="jar.jpg")
```

Expand Down Expand Up @@ -82,7 +110,7 @@ To better understand how the model came up with it's answer, you can run it in d
mode by passing in the verbose argument:

```python
>>> agent = VisionAgent(verbose=2)
>>> agent = VisionAgentCoder(verbose=2)
```

#### Detailed Usage
Expand Down Expand Up @@ -172,9 +200,11 @@ def custom_tool(image_path: str) -> str:
return np.zeros((10, 10))
```

You need to ensure you call `@va.tools.register_tool` with any imports it might use and
ensure the documentation is in the same format above with description, `Parameters:`,
`Returns:`, and `Example\n-------`. You can find an example use case [here](examples/custom_tools/).
You need to ensure you call `@va.tools.register_tool` with any imports it uses. Global
variables will not be captured by `register_tool` so you need to include them in the
function. Make sure the documentation is in the same format above with description,
`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.

### Azure Setup
If you want to use Azure OpenAI models, you need to have two OpenAI model deployments:
Expand All @@ -201,7 +231,7 @@ You can then run Vision Agent using the Azure OpenAI models:

```python
import vision_agent as va
agent = va.agent.AzureVisionAgent()
agent = va.agent.AzureVisionAgentCoder()
```

******************************************************************************************************************************
Expand All @@ -210,7 +240,7 @@ agent = va.agent.AzureVisionAgent()

#### How to get started with OpenAI API credits

1. Visit the[OpenAI API platform](https://beta.openai.com/signup/) to sign up for an API key.
1. Visit the [OpenAI API platform](https://beta.openai.com/signup/) to sign up for an API key.
2. Follow the instructions to purchase and manage your API credits.
3. Ensure your API key is correctly configured in your project settings.

Expand Down
51 changes: 51 additions & 0 deletions examples/chat/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Vision Agent Chat Application

The Vision Agent chat appliction allows you to have conversations with the agent system
to accomplish a wider variety of tasks.

## Get Started
To get started first install the requirements by running the following command:
```bash
pip install -r requirements.txt
```

There are two environment variables you must set, the first is `WORKSPACE` which is
where the agent will look for and write files to:
```bash
export WORKSPACE=/path/to/your/workspace
```

The second is `ZMQ_PORT`, this is how the agent collects logs from subprocesses it runs
for writing code:
```bash
export ZMQ_PORT=5555
```

Finally you can launch the app with the following command:
```bash
streamlit run app.py
```

You can upload an image to your workspace in the right column first tab, then ask the
agent to do a task, (be sure to include which image you want it to use for testing) for
example:
```
Can you count the number of people in this image? Use image.jpg for testing.
```

## Layout
The are two columns, left and right, each with two tabs.

`Chat` the left column first tab is where you can chat with Vision Agent. It can answer
your questions and execute python code on your behalf. Note if you ask it to generate
vision code it may take awhile to run.

`Code Execution Logs` the left column second tab is where you will see intermediate logs
when Vision Agent is generating vision code. Because code generation can take some
time, you can monitor this tab to see what the agent is doing.

`File Browser` the right column first tab is where you can see the files in your
workspace.

`Code Editor` the right column second tab is where you can examine code files the agent
has written. You can also modify the code and save it in case the code is incorrect.
Loading
Loading