Skip to content

Commit

Permalink
updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
dillonalaird committed Aug 28, 2024
1 parent 62e8a86 commit dbd55ae
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 16 deletions.
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,15 +41,15 @@ export OPENAI_API_KEY="your-api-key"
```

### Vision Agent
There are two agents that you can use. Vision Agent is a conversational agent that has
There are two agents that you can use. `VisionAgent` is a conversational agent that has
access to tools that allow it to write an navigate python code and file systems. It can
converse with the user in natural language. VisionAgentCoder is an agent that can write
code for vision tasks, such as counting people in an image. However, it cannot converse
and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
code.
converse with the user in natural language. `VisionAgentCoder` is an agent specifically
for writing code for vision tasks, such as counting people in an image. However, it
cannot chat with you and can only respond with code. `VisionAgent` can call
`VisionAgentCoder` to write vision code.

#### Basic Usage
To run the streamlit app locally to chat with Vision Agent, you can run the following
To run the streamlit app locally to chat with `VisionAgent`, you can run the following
command:

```bash
Expand Down Expand Up @@ -146,7 +146,7 @@ the code and having it update. You just need to add the code as a response from
assistant:

```python
agent = va.agent.VisionAgent(verbosity=2)
agent = va.agent.VisionAgentCoder(verbosity=2)
conv = [
{
"role": "user",
Expand Down Expand Up @@ -212,6 +212,10 @@ function. Make sure the documentation is in the same format above with descripti
`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.

Can't find the tool you need and want add it to `VisionAgent`? Check out our
[vision-agent-tools](https://github.com/landing-ai/vision-agent-tools) repository where
we add the source code for all the tools used in `VisionAgent`.

## Additional Backends
### Ollama
We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download
Expand Down
21 changes: 13 additions & 8 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,15 @@ export OPENAI_API_KEY="your-api-key"
```

### Vision Agent
There are two agents that you can use. Vision Agent is a conversational agent that has
There are two agents that you can use. `VisionAgent` is a conversational agent that has
access to tools that allow it to write an navigate python code and file systems. It can
converse with the user in natural language. VisionAgentCoder is an agent that can write
code for vision tasks, such as counting people in an image. However, it cannot converse
and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
code.
converse with the user in natural language. `VisionAgentCoder` is an agent specifically
for writing code for vision tasks, such as counting people in an image. However, it
cannot chat with you and can only respond with code. `VisionAgent` can call
`VisionAgentCoder` to write vision code.

#### Basic Usage
To run the streamlit app locally to chat with Vision Agent, you can run the following
To run the streamlit app locally to chat with `VisionAgent`, you can run the following
command:

```bash
Expand Down Expand Up @@ -143,7 +143,7 @@ the code and having it update. You just need to add the code as a response from
assistant:

```python
agent = va.agent.VisionAgent(verbosity=2)
agent = va.agent.VisionAgentCoder(verbosity=2)
conv = [
{
"role": "user",
Expand Down Expand Up @@ -209,6 +209,10 @@ function. Make sure the documentation is in the same format above with descripti
`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.

Can't find the tool you need and want add it to `VisionAgent`? Check out our
[vision-agent-tools](https://github.com/landing-ai/vision-agent-tools) repository where
we add the source code for all the tools used in `VisionAgent`.

## Additional Backends
### Ollama
We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download
Expand All @@ -230,6 +234,7 @@ tools. You can use it just like you would use `VisionAgentCoder`:
>>> agent = va.agent.OllamaVisionAgentCoder()
>>> agent("Count the apples in the image", media="apples.jpg")
```
> WARNING: VisionAgent doesn't work well unless the underlying LMM is sufficiently powerful. Do not expect good results or even working code with smaller models like Llama 3.1 8B.
### Azure OpenAI
We also provide a `AzureVisionAgentCoder` that uses Azure OpenAI models. To get started
Expand All @@ -241,7 +246,7 @@ follow the Azure Setup section below. You can use it just like you would use=
>>> agent = va.agent.AzureVisionAgentCoder()
>>> agent("Count the apples in the image", media="apples.jpg")
```
> WARNING: VisionAgent doesn't work well unless the underlying LMM is sufficiently powerful. Do not expect good results or even working code with smaller models like Llama 3.1 8B.


### Azure Setup
If you want to use Azure OpenAI models, you need to have two OpenAI model deployments:
Expand Down
2 changes: 1 addition & 1 deletion vision_agent/utils/image_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def rle_decode_array(rle: Dict[str, List[int]]) -> np.ndarray:
r"""Decode a run-length encoded mask. Returns numpy array, 1 - mask, 0 - background.
Parameters:
mask: The mask in run-length encoded as an array.
rle: The run-length encoded mask.
"""
size = rle["size"]
counts = rle["counts"]
Expand Down

0 comments on commit dbd55ae

Please sign in to comment.