Skip to content

Commit

Permalink
Merge branch 'main' of github.com:landing-ai/vision-agent into add_co…
Browse files Browse the repository at this point in the history
…unt_tool
  • Loading branch information
Dayof committed Sep 2, 2024
2 parents d51344b + e699553 commit 437c8b1
Show file tree
Hide file tree
Showing 19 changed files with 863 additions and 645 deletions.
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,15 +41,15 @@ export OPENAI_API_KEY="your-api-key"
```

### Vision Agent
There are two agents that you can use. Vision Agent is a conversational agent that has
There are two agents that you can use. `VisionAgent` is a conversational agent that has
access to tools that allow it to write an navigate python code and file systems. It can
converse with the user in natural language. VisionAgentCoder is an agent that can write
code for vision tasks, such as counting people in an image. However, it cannot converse
and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
code.
converse with the user in natural language. `VisionAgentCoder` is an agent specifically
for writing code for vision tasks, such as counting people in an image. However, it
cannot chat with you and can only respond with code. `VisionAgent` can call
`VisionAgentCoder` to write vision code.

#### Basic Usage
To run the streamlit app locally to chat with Vision Agent, you can run the following
To run the streamlit app locally to chat with `VisionAgent`, you can run the following
command:

```bash
Expand Down Expand Up @@ -146,7 +146,7 @@ the code and having it update. You just need to add the code as a response from
assistant:

```python
agent = va.agent.VisionAgent(verbosity=2)
agent = va.agent.VisionAgentCoder(verbosity=2)
conv = [
{
"role": "user",
Expand Down Expand Up @@ -212,6 +212,10 @@ function. Make sure the documentation is in the same format above with descripti
`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.

Can't find the tool you need and want add it to `VisionAgent`? Check out our
[vision-agent-tools](https://github.com/landing-ai/vision-agent-tools) repository where
we add the source code for all the tools used in `VisionAgent`.

## Additional Backends
### Ollama
We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download
Expand Down
21 changes: 13 additions & 8 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,15 @@ export OPENAI_API_KEY="your-api-key"
```

### Vision Agent
There are two agents that you can use. Vision Agent is a conversational agent that has
There are two agents that you can use. `VisionAgent` is a conversational agent that has
access to tools that allow it to write an navigate python code and file systems. It can
converse with the user in natural language. VisionAgentCoder is an agent that can write
code for vision tasks, such as counting people in an image. However, it cannot converse
and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
code.
converse with the user in natural language. `VisionAgentCoder` is an agent specifically
for writing code for vision tasks, such as counting people in an image. However, it
cannot chat with you and can only respond with code. `VisionAgent` can call
`VisionAgentCoder` to write vision code.

#### Basic Usage
To run the streamlit app locally to chat with Vision Agent, you can run the following
To run the streamlit app locally to chat with `VisionAgent`, you can run the following
command:

```bash
Expand Down Expand Up @@ -143,7 +143,7 @@ the code and having it update. You just need to add the code as a response from
assistant:

```python
agent = va.agent.VisionAgent(verbosity=2)
agent = va.agent.VisionAgentCoder(verbosity=2)
conv = [
{
"role": "user",
Expand Down Expand Up @@ -209,6 +209,10 @@ function. Make sure the documentation is in the same format above with descripti
`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.

Can't find the tool you need and want add it to `VisionAgent`? Check out our
[vision-agent-tools](https://github.com/landing-ai/vision-agent-tools) repository where
we add the source code for all the tools used in `VisionAgent`.

## Additional Backends
### Ollama
We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download
Expand All @@ -230,6 +234,7 @@ tools. You can use it just like you would use `VisionAgentCoder`:
>>> agent = va.agent.OllamaVisionAgentCoder()
>>> agent("Count the apples in the image", media="apples.jpg")
```
> WARNING: VisionAgent doesn't work well unless the underlying LMM is sufficiently powerful. Do not expect good results or even working code with smaller models like Llama 3.1 8B.
### Azure OpenAI
We also provide a `AzureVisionAgentCoder` that uses Azure OpenAI models. To get started
Expand All @@ -241,7 +246,7 @@ follow the Azure Setup section below. You can use it just like you would use=
>>> agent = va.agent.AzureVisionAgentCoder()
>>> agent("Count the apples in the image", media="apples.jpg")
```
> WARNING: VisionAgent doesn't work well unless the underlying LMM is sufficiently powerful. Do not expect good results or even working code with smaller models like Llama 3.1 8B.


### Azure Setup
If you want to use Azure OpenAI models, you need to have two OpenAI model deployments:
Expand Down
16 changes: 14 additions & 2 deletions examples/chat/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,14 @@
"response": "saved",
"style": {"bottom": "calc(50% - 4.25rem", "right": "0.4rem"},
}
agent = va.agent.VisionAgent(verbosity=1)
# set artifacts remote_path to WORKSPACE
artifacts = va.tools.Artifacts(WORKSPACE / "artifacts.pkl")
if Path("artifacts.pkl").exists():
artifacts.load("artifacts.pkl")
else:
artifacts.save("artifacts.pkl")

agent = va.agent.VisionAgent(verbosity=1, local_artifacts_path="artifacts.pkl")

st.set_page_config(layout="wide")

Expand All @@ -44,7 +51,9 @@


def update_messages(messages, lock):
new_chat = agent.chat_with_code(messages)
if Path("artifacts.pkl").exists():
artifacts.load("artifacts.pkl")
new_chat, _ = agent.chat_with_code(messages, artifacts=artifacts)
with lock:
for new_message in new_chat:
if new_message not in messages:
Expand Down Expand Up @@ -122,6 +131,9 @@ def main():
with open(WORKSPACE / uploaded_file.name, "wb") as f:
f.write(uploaded_file.getbuffer())

# make it None so it wont load and overwrite the image
artifacts.artifacts[uploaded_file.name] = None

for file in WORKSPACE.iterdir():
if "__pycache__" not in str(file) and not str(file).startswith("."):
if st.button(file.name):
Expand Down
Loading

0 comments on commit 437c8b1

Please sign in to comment.