From 41d4718e39782d00462ff4e32e4e2acafbeefb5f Mon Sep 17 00:00:00 2001 From: Dillon Laird Date: Sun, 25 Aug 2024 20:09:22 -0700 Subject: [PATCH] updated README --- README.md | 62 +++++++++++++++++++++++++++++++++++------------ docs/index.md | 67 +++++++++++++++++++++++++++++++++++++++------------ 2 files changed, 99 insertions(+), 30 deletions(-) diff --git a/README.md b/README.md index a5f24480..b5bb5401 100644 --- a/README.md +++ b/README.md @@ -168,20 +168,18 @@ result = agent.chat_with_workflow(conv) ### Tools There are a variety of tools for the model or the user to use. Some are executed locally -while others are hosted for you. You can also ask an LMM directly to build a tool for -you. For example: +while others are hosted for you. You can easily access them yourself, for example if +you want to run `owl_v2` and visualize the output you can run: ```python ->>> import vision_agent as va ->>> lmm = va.lmm.OpenAILMM() ->>> detector = lmm.generate_detector("Can you build a jar detector for me?") ->>> detector(va.tools.load_image("jar.jpg")) -[{"labels": ["jar",], - "scores": [0.99], - "bboxes": [ - [0.58, 0.2, 0.72, 0.45], - ] -}] +import vision_agent.tools as T +import matplotlib.pyplot as plt + +image = T.load_image("dogs.jpg") +dets = T.owl_v2("dogs", image) +viz = T.overlay_bounding_boxes(image, dets) +plt.imshow(viz) +plt.show() ``` You can also add custom tools to the agent: @@ -214,6 +212,40 @@ function. Make sure the documentation is in the same format above with descripti `Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case [here](examples/custom_tools/) as this is what the agent uses to pick and use the tool. +## Additional LLMs +### Ollama +We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download +a few models: + +```bash +ollama pull llama3.1 +ollama pull mxbai-embed-large +``` + +`llama3.1` is used for the `OllamaLMM` for `OllamaVisionAgentCoder`. Normally we would +use an actual LMM such as `llava` but `llava` cannot handle the long context lengths +required by the agent. Since `llama3.1` cannot handle images you may see some +performance degredation. `mxbai-embed-large` is the embedding model used to look up +tools. You can use it just like you would use `VisionAgentCoder`: + +```python +>>> import vision_agent as va +>>> agent = va.agent.OllamaVisionAgentCoder() +>>> agent("Count the apples in the image", media="apples.jpg") +``` + +### Azure OpenAI +We also provide a `AzureVisionAgentCoder` that uses Azure OpenAI models. To get started +follow the Azure Setup section below. You can use it just like you would use= +`VisionAgentCoder`: + +```python +>>> import vision_agent as va +>>> agent = va.agent.AzureVisionAgentCoder() +>>> agent("Count the apples in the image", media="apples.jpg") +``` + + ### Azure Setup If you want to use Azure OpenAI models, you need to have two OpenAI model deployments: @@ -252,6 +284,6 @@ agent = va.agent.AzureVisionAgentCoder() 2. Follow the instructions to purchase and manage your API credits. 3. Ensure your API key is correctly configured in your project settings. -Failure to have sufficient API credits may result in limited or no functionality for the features that rely on the OpenAI API. - -For more details on managing your API usage and credits, please refer to the OpenAI API documentation. +Failure to have sufficient API credits may result in limited or no functionality for +the features that rely on the OpenAI API. For more details on managing your API usage +and credits, please refer to the OpenAI API documentation. diff --git a/docs/index.md b/docs/index.md index d0ed8178..569231de 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,4 +1,9 @@ # 🔍🤖 Vision Agent +[![](https://dcbadge.vercel.app/api/server/wPdN8RCYew?compact=true&style=flat)](https://discord.gg/wPdN8RCYew) +![ci_status](https://github.com/landing-ai/vision-agent/actions/workflows/ci_cd.yml/badge.svg) +[![PyPI version](https://badge.fury.io/py/vision-agent.svg)](https://badge.fury.io/py/vision-agent) +![version](https://img.shields.io/pypi/pyversions/vision-agent) + Vision Agent is a library that helps you utilize agent frameworks to generate code to solve your vision task. Many current vision problems can easily take hours or days to @@ -160,20 +165,18 @@ result = agent.chat_with_workflow(conv) ### Tools There are a variety of tools for the model or the user to use. Some are executed locally -while others are hosted for you. You can also ask an LMM directly to build a tool for -you. For example: +while others are hosted for you. You can easily access them yourself, for example if +you want to run `owl_v2` and visualize the output you can run: ```python ->>> import vision_agent as va ->>> lmm = va.lmm.OpenAILMM() ->>> detector = lmm.generate_detector("Can you build a jar detector for me?") ->>> detector(va.tools.load_image("jar.jpg")) -[{"labels": ["jar",], - "scores": [0.99], - "bboxes": [ - [0.58, 0.2, 0.72, 0.45], - ] -}] +import vision_agent.tools as T +import matplotlib.pyplot as plt + +image = T.load_image("dogs.jpg") +dets = T.owl_v2("dogs", image) +viz = T.overlay_bounding_boxes(image, dets) +plt.imshow(viz) +plt.show() ``` You can also add custom tools to the agent: @@ -206,6 +209,40 @@ function. Make sure the documentation is in the same format above with descripti `Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case [here](examples/custom_tools/) as this is what the agent uses to pick and use the tool. +## Additional LLMs +### Ollama +We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download +a few models: + +```bash +ollama pull llama3.1 +ollama pull mxbai-embed-large +``` + +`llama3.1` is used for the `OllamaLMM` for `OllamaVisionAgentCoder`. Normally we would +use an actual LMM such as `llava` but `llava` cannot handle the long context lengths +required by the agent. Since `llama3.1` cannot handle images you may see some +performance degredation. `mxbai-embed-large` is the embedding model used to look up +tools. You can use it just like you would use `VisionAgentCoder`: + +```python +>>> import vision_agent as va +>>> agent = va.agent.OllamaVisionAgentCoder() +>>> agent("Count the apples in the image", media="apples.jpg") +``` + +### Azure OpenAI +We also provide a `AzureVisionAgentCoder` that uses Azure OpenAI models. To get started +follow the Azure Setup section below. You can use it just like you would use= +`VisionAgentCoder`: + +```python +>>> import vision_agent as va +>>> agent = va.agent.AzureVisionAgentCoder() +>>> agent("Count the apples in the image", media="apples.jpg") +``` + + ### Azure Setup If you want to use Azure OpenAI models, you need to have two OpenAI model deployments: @@ -244,6 +281,6 @@ agent = va.agent.AzureVisionAgentCoder() 2. Follow the instructions to purchase and manage your API credits. 3. Ensure your API key is correctly configured in your project settings. -Failure to have sufficient API credits may result in limited or no functionality for the features that rely on the OpenAI API. - -For more details on managing your API usage and credits, please refer to the OpenAI API documentation. +Failure to have sufficient API credits may result in limited or no functionality for +the features that rely on the OpenAI API. For more details on managing your API usage +and credits, please refer to the OpenAI API documentation.