Skip to content

Commit

Permalink
updated README
Browse files Browse the repository at this point in the history
  • Loading branch information
dillonalaird committed Aug 26, 2024
1 parent 74924e8 commit 41d4718
Show file tree
Hide file tree
Showing 2 changed files with 99 additions and 30 deletions.
62 changes: 47 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,20 +168,18 @@ result = agent.chat_with_workflow(conv)

### Tools
There are a variety of tools for the model or the user to use. Some are executed locally
while others are hosted for you. You can also ask an LMM directly to build a tool for
you. For example:
while others are hosted for you. You can easily access them yourself, for example if
you want to run `owl_v2` and visualize the output you can run:

```python
>>> import vision_agent as va
>>> lmm = va.lmm.OpenAILMM()
>>> detector = lmm.generate_detector("Can you build a jar detector for me?")
>>> detector(va.tools.load_image("jar.jpg"))
[{"labels": ["jar",],
"scores": [0.99],
"bboxes": [
[0.58, 0.2, 0.72, 0.45],
]
}]
import vision_agent.tools as T
import matplotlib.pyplot as plt

image = T.load_image("dogs.jpg")
dets = T.owl_v2("dogs", image)
viz = T.overlay_bounding_boxes(image, dets)
plt.imshow(viz)
plt.show()
```

You can also add custom tools to the agent:
Expand Down Expand Up @@ -214,6 +212,40 @@ function. Make sure the documentation is in the same format above with descripti
`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.

## Additional LLMs
### Ollama
We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download
a few models:

```bash
ollama pull llama3.1
ollama pull mxbai-embed-large
```

`llama3.1` is used for the `OllamaLMM` for `OllamaVisionAgentCoder`. Normally we would
use an actual LMM such as `llava` but `llava` cannot handle the long context lengths
required by the agent. Since `llama3.1` cannot handle images you may see some
performance degredation. `mxbai-embed-large` is the embedding model used to look up
tools. You can use it just like you would use `VisionAgentCoder`:

```python
>>> import vision_agent as va
>>> agent = va.agent.OllamaVisionAgentCoder()
>>> agent("Count the apples in the image", media="apples.jpg")
```

### Azure OpenAI
We also provide a `AzureVisionAgentCoder` that uses Azure OpenAI models. To get started
follow the Azure Setup section below. You can use it just like you would use=
`VisionAgentCoder`:

```python
>>> import vision_agent as va
>>> agent = va.agent.AzureVisionAgentCoder()
>>> agent("Count the apples in the image", media="apples.jpg")
```


### Azure Setup
If you want to use Azure OpenAI models, you need to have two OpenAI model deployments:

Expand Down Expand Up @@ -252,6 +284,6 @@ agent = va.agent.AzureVisionAgentCoder()
2. Follow the instructions to purchase and manage your API credits.
3. Ensure your API key is correctly configured in your project settings.

Failure to have sufficient API credits may result in limited or no functionality for the features that rely on the OpenAI API.

For more details on managing your API usage and credits, please refer to the OpenAI API documentation.
Failure to have sufficient API credits may result in limited or no functionality for
the features that rely on the OpenAI API. For more details on managing your API usage
and credits, please refer to the OpenAI API documentation.
67 changes: 52 additions & 15 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# 🔍🤖 Vision Agent
[![](https://dcbadge.vercel.app/api/server/wPdN8RCYew?compact=true&style=flat)](https://discord.gg/wPdN8RCYew)
![ci_status](https://github.com/landing-ai/vision-agent/actions/workflows/ci_cd.yml/badge.svg)
[![PyPI version](https://badge.fury.io/py/vision-agent.svg)](https://badge.fury.io/py/vision-agent)
![version](https://img.shields.io/pypi/pyversions/vision-agent)
</div>

Vision Agent is a library that helps you utilize agent frameworks to generate code to
solve your vision task. Many current vision problems can easily take hours or days to
Expand Down Expand Up @@ -160,20 +165,18 @@ result = agent.chat_with_workflow(conv)

### Tools
There are a variety of tools for the model or the user to use. Some are executed locally
while others are hosted for you. You can also ask an LMM directly to build a tool for
you. For example:
while others are hosted for you. You can easily access them yourself, for example if
you want to run `owl_v2` and visualize the output you can run:

```python
>>> import vision_agent as va
>>> lmm = va.lmm.OpenAILMM()
>>> detector = lmm.generate_detector("Can you build a jar detector for me?")
>>> detector(va.tools.load_image("jar.jpg"))
[{"labels": ["jar",],
"scores": [0.99],
"bboxes": [
[0.58, 0.2, 0.72, 0.45],
]
}]
import vision_agent.tools as T
import matplotlib.pyplot as plt

image = T.load_image("dogs.jpg")
dets = T.owl_v2("dogs", image)
viz = T.overlay_bounding_boxes(image, dets)
plt.imshow(viz)
plt.show()
```

You can also add custom tools to the agent:
Expand Down Expand Up @@ -206,6 +209,40 @@ function. Make sure the documentation is in the same format above with descripti
`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.

## Additional LLMs
### Ollama
We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download
a few models:

```bash
ollama pull llama3.1
ollama pull mxbai-embed-large
```

`llama3.1` is used for the `OllamaLMM` for `OllamaVisionAgentCoder`. Normally we would
use an actual LMM such as `llava` but `llava` cannot handle the long context lengths
required by the agent. Since `llama3.1` cannot handle images you may see some
performance degredation. `mxbai-embed-large` is the embedding model used to look up
tools. You can use it just like you would use `VisionAgentCoder`:

```python
>>> import vision_agent as va
>>> agent = va.agent.OllamaVisionAgentCoder()
>>> agent("Count the apples in the image", media="apples.jpg")
```

### Azure OpenAI
We also provide a `AzureVisionAgentCoder` that uses Azure OpenAI models. To get started
follow the Azure Setup section below. You can use it just like you would use=
`VisionAgentCoder`:

```python
>>> import vision_agent as va
>>> agent = va.agent.AzureVisionAgentCoder()
>>> agent("Count the apples in the image", media="apples.jpg")
```


### Azure Setup
If you want to use Azure OpenAI models, you need to have two OpenAI model deployments:

Expand Down Expand Up @@ -244,6 +281,6 @@ agent = va.agent.AzureVisionAgentCoder()
2. Follow the instructions to purchase and manage your API credits.
3. Ensure your API key is correctly configured in your project settings.

Failure to have sufficient API credits may result in limited or no functionality for the features that rely on the OpenAI API.

For more details on managing your API usage and credits, please refer to the OpenAI API documentation.
Failure to have sufficient API credits may result in limited or no functionality for
the features that rely on the OpenAI API. For more details on managing your API usage
and credits, please refer to the OpenAI API documentation.

0 comments on commit 41d4718

Please sign in to comment.