Skip to content

Commit

Permalink
update readme example (#36)
Browse files Browse the repository at this point in the history
  • Loading branch information
dillonalaird authored Apr 1, 2024
1 parent cf94bbf commit f8b4773
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 34 deletions.
31 changes: 15 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ You can interact with the agents as you would with any LLM or LMM model:
```python
>>> import vision_agent as va
>>> agent = VisionAgent()
>>> agent("How many apples are in this image?", image="apples.jpg")
"There are 2 apples in the image."
>>> agent("What percentage of the area of this jar is filled with coffee beans?", image="jar.jpg")
"The percentage of area of the jar filled with coffee beans is 25%."
```

To better understand how the model came up with it's answer, you can also run it in
Expand All @@ -57,22 +57,22 @@ You can also have it return the workflow it used to complete the task along with
the individual steps and tools to get the answer:

```python
>>> resp, workflow = agent.chat_with_workflow([{"role": "user", "content": "How many apples are in this image?"}], image="apples.jpg")
>>> resp, workflow = agent.chat_with_workflow([{"role": "user", "content": "What percentage of the area of this jar is filled with coffee beans?"}], image="jar.jpg")
>>> print(workflow)
[{"task": "Count the number of apples using 'grounding_dino_'.",
"tool": "grounding_dino_",
"parameters": {"prompt": "apple", "image": "apples.jpg"},
[{"task": "Segment the jar using 'grounding_sam_'.",
"tool": "grounding_sam_",
"parameters": {"prompt": "jar", "image": "jar.jpg"},
"call_results": [[
{
"labels": ["apple", "apple"],
"scores": [0.99, 0.95],
"labels": ["jar"],
"scores": [0.99],
"bboxes": [
[0.58, 0.2, 0.72, 0.45],
[0.94, 0.57, 0.98, 0.66],
]
],
"masks": "mask.png"
}
]],
"answer": "There are 2 apples in the image.",
"answer": "The jar is located at [0.58, 0.2, 0.72, 0.45].",
}]
```

Expand All @@ -84,13 +84,12 @@ you. For example:
```python
>>> import vision_agent as va
>>> llm = va.llm.OpenAILLM()
>>> detector = llm.generate_detector("Can you build an apple detector for me?")
>>> detector("apples.jpg")
[{"labels": ["apple", "apple"],
"scores": [0.99, 0.95],
>>> detector = llm.generate_detector("Can you build a jar detector for me?")
>>> detector("jar.jpg")
[{"labels": ["jar",],
"scores": [0.99],
"bboxes": [
[0.58, 0.2, 0.72, 0.45],
[0.94, 0.57, 0.98, 0.66],
]
}]
```
Expand Down
18 changes: 0 additions & 18 deletions docs/old.md → docs/lmms_and_datastore.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,3 @@
<p align="center">
<img width="100" height="100" src="https://github.com/landing-ai/landingai-python/raw/main/assets/avi-logo.png">
</p>

# Welcome to the Landing AI LMM Tools Documentation

This library provides a set of tools to help you build applications with Large Multimodal Model (LMM).


## Quick Start

### Install
First, install the library:

```bash
pip install vision-agent
```

### LMMs
One of the problems of dealing with image data is it can be difficult to organize and
search. For example, you might have a bunch of pictures of houses and want to count how
Expand Down

0 comments on commit f8b4773

Please sign in to comment.