Skip to content

Commit

Permalink
updated docs;
Browse files Browse the repository at this point in the history
  • Loading branch information
dillonalaird committed Jul 16, 2024
1 parent b0bafb9 commit 8fde0de
Show file tree
Hide file tree
Showing 5 changed files with 98 additions and 15 deletions.
35 changes: 28 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,32 @@ export OPENAI_API_KEY="your-api-key"
### Important Note on API Usage
Please be aware that using the API in this project requires you to have API credits (minimum of five US dollars). This is different from the OpenAI subscription used in this chatbot. If you don't have credit, further information can be found [here](https://github.com/landing-ai/vision-agent?tab=readme-ov-file#how-to-get-started-with-openai-api-credits)


### Vision Agent
#### Basic Usage
You can interact with the agent as you would with any LLM or LMM model:
There are two agents that you can use. Vision Agent is a conversational agent that has
access to tools that allow it to write an navigate python code. It can converse with
the user in natural language. VisionAgentCoder is an agent that can write code for
vision tasks, such as counting people in an image. However, it cannot converse and can
only respond with code. VisionAgent can call VisionAgentCoder to write vision code.

#### Basic Usage
```python
>>> from vision_agent.agent import VisionAgent
>>> agent = VisionAgent()
>>> resp = agent("Hello")
>>> print(resp)
[{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "{'thoughts': 'The user has greeted me. I will respond with a greeting and ask how I can assist them.', 'response': 'Hello! How can I assist you today?', 'let_user_respond': True}"}]
>>> resp.append({"role": "user", "content": "Can you count the number of people in this image?", "media": ["people.jpg"]})
>>> resp = agent(resp)
```

### Vision Agent Coder
#### Basic Usage
You can interact with the agent as you would with any LLM or LMM model:

```python
>>> from vision_agent.agent import VisionAgentCoder
>>> agent = VisionAgentCoder()
>>> code = agent("What percentage of the area of the jar is filled with coffee beans?", media="jar.jpg")
```

Expand Down Expand Up @@ -90,7 +109,7 @@ To better understand how the model came up with it's answer, you can run it in d
mode by passing in the verbose argument:

```python
>>> agent = VisionAgent(verbose=2)
>>> agent = VisionAgentCoder(verbose=2)
```

#### Detailed Usage
Expand Down Expand Up @@ -180,9 +199,11 @@ def custom_tool(image_path: str) -> str:
return np.zeros((10, 10))
```

You need to ensure you call `@va.tools.register_tool` with any imports it might use and
ensure the documentation is in the same format above with description, `Parameters:`,
`Returns:`, and `Example\n-------`. You can find an example use case [here](examples/custom_tools/).
You need to ensure you call `@va.tools.register_tool` with any imports it uses. Global
variables will not be captured by `register_tool` so you need to include them in the
function. Make sure the documentation is in the same format above with description,
`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.

### Azure Setup
If you want to use Azure OpenAI models, you need to have two OpenAI model deployments:
Expand All @@ -209,7 +230,7 @@ You can then run Vision Agent using the Azure OpenAI models:

```python
import vision_agent as va
agent = va.agent.AzureVisionAgent()
agent = va.agent.AzureVisionAgentCoder()
```

******************************************************************************************************************************
Expand Down
35 changes: 28 additions & 7 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,32 @@ export OPENAI_API_KEY="your-api-key"
### Important Note on API Usage
Please be aware that using the API in this project requires you to have API credits (minimum of five US dollars). This is different from the OpenAI subscription used in this chatbot. If you don't have credit, further information can be found [here](https://github.com/landing-ai/vision-agent?tab=readme-ov-file#how-to-get-started-with-openai-api-credits)


### Vision Agent
#### Basic Usage
You can interact with the agent as you would with any LLM or LMM model:
There are two agents that you can use. Vision Agent is a conversational agent that has
access to tools that allow it to write an navigate python code. It can converse with
the user in natural language. VisionAgentCoder is an agent that can write code for
vision tasks, such as counting people in an image. However, it cannot converse and can
only respond with code. VisionAgent can call VisionAgentCoder to write vision code.

#### Basic Usage
```python
>>> from vision_agent.agent import VisionAgent
>>> agent = VisionAgent()
>>> resp = agent("Hello")
>>> print(resp)
[{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "{'thoughts': 'The user has greeted me. I will respond with a greeting and ask how I can assist them.', 'response': 'Hello! How can I assist you today?', 'let_user_respond': True}"}]
>>> resp.append({"role": "user", "content": "Can you count the number of people in this image?", "media": ["people.jpg"]})
>>> resp = agent(resp)
```

### Vision Agent Coder
#### Basic Usage
You can interact with the agent as you would with any LLM or LMM model:

```python
>>> from vision_agent.agent import VisionAgentCoder
>>> agent = VisionAgentCoder()
>>> code = agent("What percentage of the area of the jar is filled with coffee beans?", media="jar.jpg")
```

Expand Down Expand Up @@ -82,7 +101,7 @@ To better understand how the model came up with it's answer, you can run it in d
mode by passing in the verbose argument:

```python
>>> agent = VisionAgent(verbose=2)
>>> agent = VisionAgentCoder(verbose=2)
```

#### Detailed Usage
Expand Down Expand Up @@ -172,9 +191,11 @@ def custom_tool(image_path: str) -> str:
return np.zeros((10, 10))
```

You need to ensure you call `@va.tools.register_tool` with any imports it might use and
ensure the documentation is in the same format above with description, `Parameters:`,
`Returns:`, and `Example\n-------`. You can find an example use case [here](examples/custom_tools/).
You need to ensure you call `@va.tools.register_tool` with any imports it uses. Global
variables will not be captured by `register_tool` so you need to include them in the
function. Make sure the documentation is in the same format above with description,
`Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
[here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.

### Azure Setup
If you want to use Azure OpenAI models, you need to have two OpenAI model deployments:
Expand All @@ -201,7 +222,7 @@ You can then run Vision Agent using the Azure OpenAI models:

```python
import vision_agent as va
agent = va.agent.AzureVisionAgent()
agent = va.agent.AzureVisionAgentCoder()
```

******************************************************************************************************************************
Expand Down
39 changes: 39 additions & 0 deletions vision_agent/agent/vision_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,20 @@ def parse_execution(response: str) -> Optional[str]:


class VisionAgent(Agent):
"""Vision Agent is an agent that can chat with the user and call tools or other
agents to generate code for it. Vision Agent uses python code to execute actions for
the user. Vision Agent is inspired by by OpenDev
https://github.com/OpenDevin/OpenDevin and CodeAct https://arxiv.org/abs/2402.01030
Example
-------
>>> from vision_agent.agent import VisionAgent
>>> agent = VisionAgent()
>>> resp = agent("Hello")
>>> resp.append({"role": "user", "content": "Can you write a function that counts dogs?", "media": ["dog.jpg"]})
>>> resp = agent(resp)
"""

def __init__(
self,
agent: Optional[LMM] = None,
Expand All @@ -120,6 +134,17 @@ def __call__(
input: Union[str, List[Message]],
media: Optional[Union[str, Path]] = None,
) -> str:
"""Chat with VisionAgent and get the conversation response.
Parameters:
input (Union[str, List[Message]): A conversation in the format of
[{"role": "user", "content": "describe your task here..."}, ...] or a
string of just the contents.
media (Optional[Union[str, Path]]): The media file to be used in the task.
Returns:
str: The conversation response.
"""
if isinstance(input, str):
input = [{"role": "user", "content": input}]
if media is not None:
Expand All @@ -131,6 +156,20 @@ def chat_with_code(
self,
chat: List[Message],
) -> List[Message]:
"""Chat with VisionAgent, it will use code to execute actions to accomplish
its tasks.
Parameters:
chat (List[Message]): A conversation
in the format of:
[{"role": "user", "content": "describe your task here..."}]
or if it contains media files, it should be in the format of:
[{"role": "user", "content": "describe your task here...", "media": ["image1.jpg", "image2.jpg"]}]
Returns:
List[Message]: The conversation response.
"""

if not chat:
raise ValueError("chat cannot be empty")

Expand Down
2 changes: 1 addition & 1 deletion vision_agent/agent/vision_agent_coder.py
Original file line number Diff line number Diff line change
Expand Up @@ -492,7 +492,7 @@ class VisionAgentCoder(Agent):
Example
-------
>>> from vision_agent import VisionAgentCoder
>>> from vision_agent.agent import VisionAgentCoder
>>> agent = VisionAgentCoder()
>>> code = agent("What percentage of the area of the jar is filled with coffee beans?", media="jar.jpg")
"""
Expand Down
2 changes: 2 additions & 0 deletions vision_agent/tools/meta_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
from vision_agent.lmm.types import Message
from vision_agent.tools.tool_utils import get_tool_documentation

# These tools are adapted from SWE-Agent https://github.com/princeton-nlp/SWE-agent

CURRENT_FILE = None
CURRENT_LINE = 0
DEFAULT_WINDOW_SIZE = 100
Expand Down

0 comments on commit 8fde0de

Please sign in to comment.