Skip to content

Commit

Permalink
Update conversation (#217)
Browse files Browse the repository at this point in the history
* update for new conv

* add artifact tools

* update local executor

* fix upload/download

* cleaned up code for artifacts

* starting artifact prompts

* app to add files to artifacts

* add support for artifacts

* add artifact meta tools

* ran isort

* prompt to work with artifacts

* minor fixes for prompts

* add docs, fix load and saving remote files

* rename prompts

* add docs for artifacts, allow None artifacts (which don't load) to be added

* e2b and local uplaod/download work similarly now, can pass in target download path

* add Artifacts to exports

* local chat app to work with artifacts

* updated docs

* fix flake8

* fix mypy errors

* fix format

* add execution to conversation

* fixed type errors

* fixed bug with upload file

* added ability to write media files to artifacts

* return outside of context

* make remote path execute variable

* add codec for video encoding

* fix prompts to include writing media artifacts

* isort

* fix typo

* added redisplay for nested notebook sessions

* return artifacts

* add trace for last edited artifact

* handle artifact return

* only add text to obs, no trace
  • Loading branch information
dillonalaird authored Aug 30, 2024
1 parent 2842fdc commit cacae44
Show file tree
Hide file tree
Showing 16 changed files with 560 additions and 401 deletions.
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,15 +41,15 @@ export OPENAI_API_KEY="your-api-key"
```

### Vision Agent
There are two agents that you can use. Vision Agent is a conversational agent that has
There are two agents that you can use. `VisionAgent` is a conversational agent that has
access to tools that allow it to write an navigate python code and file systems. It can
converse with the user in natural language. VisionAgentCoder is an agent that can write
code for vision tasks, such as counting people in an image. However, it cannot converse
and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
code.
converse with the user in natural language. `VisionAgentCoder` is an agent specifically
for writing code for vision tasks, such as counting people in an image. However, it
cannot chat with you and can only respond with code. `VisionAgent` can call
`VisionAgentCoder` to write vision code.

#### Basic Usage
To run the streamlit app locally to chat with Vision Agent, you can run the following
To run the streamlit app locally to chat with `VisionAgent`, you can run the following
command:

```bash
Expand Down Expand Up @@ -146,7 +146,7 @@ the code and having it update. You just need to add the code as a response from
assistant:

```python
agent = va.agent.VisionAgent(verbosity=2)
agent = va.agent.VisionAgentCoder(verbosity=2)
conv = [
{
"role": "user",
Expand Down Expand Up @@ -212,6 +212,10 @@ function. Make sure the documentation is in the same format above with descripti
`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.

Can't find the tool you need and want add it to `VisionAgent`? Check out our
[vision-agent-tools](https://github.com/landing-ai/vision-agent-tools) repository where
we add the source code for all the tools used in `VisionAgent`.

## Additional Backends
### Ollama
We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download
Expand Down
21 changes: 13 additions & 8 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,15 @@ export OPENAI_API_KEY="your-api-key"
```

### Vision Agent
There are two agents that you can use. Vision Agent is a conversational agent that has
There are two agents that you can use. `VisionAgent` is a conversational agent that has
access to tools that allow it to write an navigate python code and file systems. It can
converse with the user in natural language. VisionAgentCoder is an agent that can write
code for vision tasks, such as counting people in an image. However, it cannot converse
and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
code.
converse with the user in natural language. `VisionAgentCoder` is an agent specifically
for writing code for vision tasks, such as counting people in an image. However, it
cannot chat with you and can only respond with code. `VisionAgent` can call
`VisionAgentCoder` to write vision code.

#### Basic Usage
To run the streamlit app locally to chat with Vision Agent, you can run the following
To run the streamlit app locally to chat with `VisionAgent`, you can run the following
command:

```bash
Expand Down Expand Up @@ -143,7 +143,7 @@ the code and having it update. You just need to add the code as a response from
assistant:

```python
agent = va.agent.VisionAgent(verbosity=2)
agent = va.agent.VisionAgentCoder(verbosity=2)
conv = [
{
"role": "user",
Expand Down Expand Up @@ -209,6 +209,10 @@ function. Make sure the documentation is in the same format above with descripti
`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.

Can't find the tool you need and want add it to `VisionAgent`? Check out our
[vision-agent-tools](https://github.com/landing-ai/vision-agent-tools) repository where
we add the source code for all the tools used in `VisionAgent`.

## Additional Backends
### Ollama
We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download
Expand All @@ -230,6 +234,7 @@ tools. You can use it just like you would use `VisionAgentCoder`:
>>> agent = va.agent.OllamaVisionAgentCoder()
>>> agent("Count the apples in the image", media="apples.jpg")
```
> WARNING: VisionAgent doesn't work well unless the underlying LMM is sufficiently powerful. Do not expect good results or even working code with smaller models like Llama 3.1 8B.
### Azure OpenAI
We also provide a `AzureVisionAgentCoder` that uses Azure OpenAI models. To get started
Expand All @@ -241,7 +246,7 @@ follow the Azure Setup section below. You can use it just like you would use=
>>> agent = va.agent.AzureVisionAgentCoder()
>>> agent("Count the apples in the image", media="apples.jpg")
```
> WARNING: VisionAgent doesn't work well unless the underlying LMM is sufficiently powerful. Do not expect good results or even working code with smaller models like Llama 3.1 8B.


### Azure Setup
If you want to use Azure OpenAI models, you need to have two OpenAI model deployments:
Expand Down
16 changes: 14 additions & 2 deletions examples/chat/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,14 @@
"response": "saved",
"style": {"bottom": "calc(50% - 4.25rem", "right": "0.4rem"},
}
agent = va.agent.VisionAgent(verbosity=1)
# set artifacts remote_path to WORKSPACE
artifacts = va.tools.Artifacts(WORKSPACE / "artifacts.pkl")
if Path("artifacts.pkl").exists():
artifacts.load("artifacts.pkl")
else:
artifacts.save("artifacts.pkl")

agent = va.agent.VisionAgent(verbosity=1, local_artifacts_path="artifacts.pkl")

st.set_page_config(layout="wide")

Expand All @@ -44,7 +51,9 @@


def update_messages(messages, lock):
new_chat = agent.chat_with_code(messages)
if Path("artifacts.pkl").exists():
artifacts.load("artifacts.pkl")
new_chat, _ = agent.chat_with_code(messages, artifacts=artifacts)
with lock:
for new_message in new_chat:
if new_message not in messages:
Expand Down Expand Up @@ -122,6 +131,9 @@ def main():
with open(WORKSPACE / uploaded_file.name, "wb") as f:
f.write(uploaded_file.getbuffer())

# make it None so it wont load and overwrite the image
artifacts.artifacts[uploaded_file.name] = None

for file in WORKSPACE.iterdir():
if "__pycache__" not in str(file) and not str(file).startswith("."):
if st.button(file.name):
Expand Down
2 changes: 1 addition & 1 deletion vision_agent/agent/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ def __call__(
self,
input: Union[str, List[Message]],
media: Optional[Union[str, Path]] = None,
) -> str:
) -> Union[str, List[Message]]:
pass

@abstractmethod
Expand Down
Loading

0 comments on commit cacae44

Please sign in to comment.