Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update readme example #36

Merged
merged 1 commit into from
Apr 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 15 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ You can interact with the agents as you would with any LLM or LMM model:
```python
>>> import vision_agent as va
>>> agent = VisionAgent()
>>> agent("How many apples are in this image?", image="apples.jpg")
"There are 2 apples in the image."
>>> agent("What percentage of the area of this jar is filled with coffee beans?", image="jar.jpg")
"The percentage of area of the jar filled with coffee beans is 25%."
```

To better understand how the model came up with it's answer, you can also run it in
Expand All @@ -57,22 +57,22 @@ You can also have it return the workflow it used to complete the task along with
the individual steps and tools to get the answer:

```python
>>> resp, workflow = agent.chat_with_workflow([{"role": "user", "content": "How many apples are in this image?"}], image="apples.jpg")
>>> resp, workflow = agent.chat_with_workflow([{"role": "user", "content": "What percentage of the area of this jar is filled with coffee beans?"}], image="jar.jpg")
>>> print(workflow)
[{"task": "Count the number of apples using 'grounding_dino_'.",
"tool": "grounding_dino_",
"parameters": {"prompt": "apple", "image": "apples.jpg"},
[{"task": "Segment the jar using 'grounding_sam_'.",
"tool": "grounding_sam_",
"parameters": {"prompt": "jar", "image": "jar.jpg"},
"call_results": [[
{
"labels": ["apple", "apple"],
"scores": [0.99, 0.95],
"labels": ["jar"],
"scores": [0.99],
"bboxes": [
[0.58, 0.2, 0.72, 0.45],
[0.94, 0.57, 0.98, 0.66],
]
],
"masks": "mask.png"
}
]],
"answer": "There are 2 apples in the image.",
"answer": "The jar is located at [0.58, 0.2, 0.72, 0.45].",
}]
```

Expand All @@ -84,13 +84,12 @@ you. For example:
```python
>>> import vision_agent as va
>>> llm = va.llm.OpenAILLM()
>>> detector = llm.generate_detector("Can you build an apple detector for me?")
>>> detector("apples.jpg")
[{"labels": ["apple", "apple"],
"scores": [0.99, 0.95],
>>> detector = llm.generate_detector("Can you build a jar detector for me?")
>>> detector("jar.jpg")
[{"labels": ["jar",],
"scores": [0.99],
"bboxes": [
[0.58, 0.2, 0.72, 0.45],
[0.94, 0.57, 0.98, 0.66],
]
}]
```
Expand Down
18 changes: 0 additions & 18 deletions docs/old.md → docs/lmms_and_datastore.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,3 @@
<p align="center">
<img width="100" height="100" src="https://github.com/landing-ai/landingai-python/raw/main/assets/avi-logo.png">
</p>

# Welcome to the Landing AI LMM Tools Documentation

This library provides a set of tools to help you build applications with Large Multimodal Model (LMM).


## Quick Start

### Install
First, install the library:

```bash
pip install vision-agent
```

### LMMs
One of the problems of dealing with image data is it can be difficult to organize and
search. For example, you might have a bunch of pictures of houses and want to count how
Expand Down
Loading