Polish API docs, auto publish docs in CI

landing-ai · Mar 26, 2024 · 04345a1 · 04345a1
1 parent ac79fa6
commit 04345a1
Show file tree

Hide file tree

Showing 18 changed files with 398 additions and 75 deletions.
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -0,0 +1,48 @@
+name: pdoc
+
+# build the documentation whenever there are new commits on main
+on:
+ push:
+ branches:
+ - main
+
+# security: restrict permissions for CI jobs.
+permissions:
+ contents: read
+
+jobs:
+ # Build the documentation and upload the static HTML files as an artifact.
+ build:
+ runs-on: ubuntu-latest
+ steps:
+ - uses: actions/checkout@v3
+ - uses: actions/setup-python@v4
+ with:
+ python-version: 3.10.11
+
+ - uses: Gr1N/setup-poetry@v8
+ with:
+ poetry-version: "1.2.2"
+
+ - run: poetry install --all-extras
+ - run: mkdir -p docs-build
+ - run: poetry run mkdocs build -f mkdocs.yml -d docs-build/
+
+ - uses: actions/upload-pages-artifact@v1
+ with:
+ path: docs-build/
+
+ # Deploy the artifact to GitHub pages.
+ # This is a separate job so that only actions/deploy-pages has the necessary permissions.
+ deploy:
+ needs: build
+ runs-on: ubuntu-latest
+ permissions:
+ pages: write
+ id-token: write
+ environment:
+ name: github-pages
+ url: ${{ steps.deployment.outputs.page_url }}
+ steps:
+ - id: deployment
+ uses: actions/deploy-pages@v2
diff --git a/.gitignore b/.gitignore
@@ -89,6 +89,7 @@ MANIFEST
 examples/output
 tests/output
 docs-build
+site
 
 # Local or WIP files
 local/
diff --git a/docs/_overrides/main.html b/docs/_overrides/main.html
@@ -0,0 +1,5 @@
+{% extends "base.html" %}
+
+{% block footer %}
+ {{ super() }}
+{% endblock %}
diff --git a/docs/api/agent.md b/docs/api/agent.md
@@ -0,0 +1,13 @@
+::: vision_agent.agent
+
+::: vision_agent.agent.agent
+
+::: vision_agent.agent.easytool
+
+::: vision_agent.agent.easytool_prompts
+
+::: vision_agent.agent.reflexion
+
+::: vision_agent.agent.reflexion_prompts
+
+::: vision_agent.agent.vision_agent
diff --git a/docs/api/data.md b/docs/api/data.md
@@ -0,0 +1,3 @@
+::: vision_agent.data
+
+::: vision_agent.data.data
diff --git a/docs/api/emb.md b/docs/api/emb.md
@@ -0,0 +1,3 @@
+::: vision_agent.emb
+
+::: vision_agent.emb.emb
diff --git a/docs/api/image_utils.md b/docs/api/image_utils.md
@@ -0,0 +1 @@
+::: vision_agent.image_utils
diff --git a/docs/api/llm.md b/docs/api/llm.md
@@ -0,0 +1,3 @@
+::: vision_agent.llm
+
+::: vision_agent.llm.llm
diff --git a/docs/api/lmm.md b/docs/api/lmm.md
@@ -0,0 +1,3 @@
+::: vision_agent.lmm
+
+::: vision_agent.lmm.lmm
diff --git a/docs/api/tools.md b/docs/api/tools.md
@@ -0,0 +1,5 @@
+::: vision_agent.tools
+
+::: vision_agent.tools.prompts
+
+::: vision_agent.tools.tools
diff --git a/docs/index.md b/docs/index.md
@@ -1,87 +1,95 @@
-<p align="center">
- <img width="100" height="100" src="https://github.com/landing-ai/landingai-python/raw/main/assets/avi-logo.png">
-</p>
+# 🔍🤖 Vision Agent
 
-# Welcome to the Landing AI LMM Tools Documentation
+Vision Agent is a library that helps you utilize agent frameworks for your vision tasks.
+Many current vision problems can easily take hours or days to solve, you need to find the
+right model, figure out how to use it, possibly write programming logic around it to 
+accomplish the task you want or even more expensive, train your own model. Vision Agent
+aims to provide an in-seconds experience by allowing users to describe their problem in
+text and utilizing agent frameworks to solve the task for them. Check out our discord
+for updates and roadmaps!
 
-This library provides a set of tools to help you build applications with Large Multimodal Model (LMM).
-
-
-## Quick Start
-
-### Install
-First, install the library:
+## Getting Started
+### Installation
+To get started, you can install the library using pip:
 
 ```bash
 pip install vision-agent
 ```
 
-### LMMs
-One of the problems of dealing with image data is it can be difficult to organize and
-search. For example, you might have a bunch of pictures of houses and want to count how
-many yellow houses you have, or how many houses with adobe roofs. The vision agent
-library uses LMMs to help create tags or descriptions of images to allow you to search
-over them, or use them in a database to carry out other operations.
-
-To get started, you can use an LMM to start generating text from images. The following
-code will use the LLaVA-1.6 34B model to generate a description of the image you pass it.
+Ensure you have an OpenAI API key and set it as an environment variable:
 
-```python
-import vision_agent as va
-
-model = va.lmm.get_lmm("llava")
-model.generate("Describe this image", "image.png")
->>> "A yellow house with a green lawn."
+```bash
+export OPENAI_API_KEY="your-api-key"
 ```
 
-**WARNING** We are hosting the LLaVA-1.6 34B model, if it times out please wait ~3-5
-min for the server to warm up as it shuts down when usage is low.
-
-### DataStore
-You can use the `DataStore` class to store your images, add new metadata to them such
-as descriptions, and search over different columns.
+### Vision Agents
+You can interact with the agents as you would with any LLM or LMM model:
 
 ```python
-import vision_agent as va
-import pandas as pd
-
-df = pd.DataFrame({"image_paths": ["image1.png", "image2.png", "image3.png"]})
-ds = va.data.DataStore(df)
-ds = ds.add_lmm(va.lmm.get_lmm("llava"))
-ds = ds.add_embedder(va.emb.get_embedder("sentence-transformer"))
-
-ds = ds.add_column("descriptions", "Describe this image.")
+>>> import vision_agent as va
+>>> agent = VisionAgent()
+>>> agent("How many apples are in this image?", image="apples.jpg")
+"There are 2 apples in the image."
 ```
 
-This will use the prompt you passed, "Describe this image.", and the LMM to create a
-new column of descriptions for your image. Your data will now contain a new column with
-the descriptions of each image:
+To better understand how the model came up with it's answer, you can also run it in
+debug mode by passing in the verbose argument:
 
-| image\_paths | image\_id | descriptions |
-| --- | --- | --- |
-| image1.png | 1 | "A yellow house with a green lawn." |
-| image2.png | 2 | "A white house with a two door garage." |
-| image3.png | 3 | "A wooden house in the middle of the forest." |
+```python
+>>> agent = VisionAgent(verbose=True)
+```
 
-You can now create an index on the descriptions column and search over it to find images
-that match your query.
+You can also have it return the workflow it used to complete the task along with all
+the individual steps and tools to get the answer:
 
 ```python
-ds = ds.build_index("descriptions")
-ds.search("A yellow house.", top_k=1)
->>> [{'image_paths': 'image1.png', 'image_id': 1, 'descriptions': 'A yellow house with a green lawn.'}]
+>>> resp, workflow = agent.chat_with_workflow([{"role": "user", "content": "How many apples are in this image?"}], image="apples.jpg")
+>>> print(workflow)
+[{"task": "Count the number of apples using 'grounding_dino_'.",
+ "tool": "grounding_dino_",
+ "parameters": {"prompt": "apple", "image": "apples.jpg"},
+ "call_results": [[
+ {
+ "labels": ["apple", "apple"],
+ "scores": [0.99, 0.95],
+ "bboxes": [
+ [0.58, 0.2, 0.72, 0.45],
+ [0.94, 0.57, 0.98, 0.66],
+ ]
+ }
+ ]],
+ "answer": "There are 2 apples in the image.",
+}]
 ```
 
-You can also create other columns for you data such as `is_yellow`:
+### Tools
+There are a variety of tools for the model or the user to use. Some are executed locally
+while others are hosted for you. You can also ask an LLM directly to build a tool for
+you. For example:
 
 ```python
-ds = ds.add_column("is_yellow", "Is the house in this image yellow? Please answer yes or no.")
+>>> import vision_agent as va
+>>> llm = va.llm.OpenAILLM()
+>>> detector = llm.generate_detector("Can you build an apple detector for me?")
+>>> detector("apples.jpg")
+[{"labels": ["apple", "apple"],
+ "scores": [0.99, 0.95],
+ "bboxes": [
+ [0.58, 0.2, 0.72, 0.45],
+ [0.94, 0.57, 0.98, 0.66],
+ ]
+}]
 ```
 
-which would give you a dataset similar to this:
+| Tool | Description |
+| --- | --- |
+| CLIP | CLIP is a tool that can classify or tag any image given a set of input classes or tags. |
+| GroundingDINO | GroundingDINO is a tool that can detect arbitrary objects with inputs such as category names or referring expressions. |
+| GroundingSAM | GroundingSAM is a tool that can detect and segment arbitrary objects with inputs such as category names or referring expressions. |
+| Counter | Counter detects and counts the number of objects in an image given an input such as a category name or referring expression. |
+| Crop | Crop crops an image given a bounding box and returns a file name of the cropped image. |
+| BboxArea | BboxArea returns the area of the bounding box in pixels normalized to 2 decimal places. |
+| SegArea | SegArea returns the area of the segmentation mask in pixels normalized to 2 decimal places. |
+
 
-| image\_paths | image\_id | descriptions | is\_yellow |
-| --- | --- | --- | --- |
-| image1.png | 1 | "A yellow house with a green lawn." | "yes" |
-| image2.png | 2 | "A white house with a two door garage." | "no" |
-| image3.png | 3 | "A wooden house in the middle of the forest." | "no" |
+It also has a basic set of calculate tools such as add, subtract, multiply and divide.
diff --git a/docs/old.md b/docs/old.md
@@ -0,0 +1,87 @@
+<p align="center">
+ <img width="100" height="100" src="https://github.com/landing-ai/landingai-python/raw/main/assets/avi-logo.png">
+</p>
+
+# Welcome to the Landing AI LMM Tools Documentation
+
+This library provides a set of tools to help you build applications with Large Multimodal Model (LMM).
+
+
+## Quick Start
+
+### Install
+First, install the library:
+
+```bash
+pip install vision-agent
+```
+
+### LMMs
+One of the problems of dealing with image data is it can be difficult to organize and
+search. For example, you might have a bunch of pictures of houses and want to count how
+many yellow houses you have, or how many houses with adobe roofs. The vision agent
+library uses LMMs to help create tags or descriptions of images to allow you to search
+over them, or use them in a database to carry out other operations.
+
+To get started, you can use an LMM to start generating text from images. The following
+code will use the LLaVA-1.6 34B model to generate a description of the image you pass it.
+
+```python
+import vision_agent as va
+
+model = va.lmm.get_lmm("llava")
+model.generate("Describe this image", "image.png")
+>>> "A yellow house with a green lawn."
+```
+
+**WARNING** We are hosting the LLaVA-1.6 34B model, if it times out please wait ~3-5
+min for the server to warm up as it shuts down when usage is low.
+
+### DataStore
+You can use the `DataStore` class to store your images, add new metadata to them such
+as descriptions, and search over different columns.
+
+```python
+import vision_agent as va
+import pandas as pd
+
+df = pd.DataFrame({"image_paths": ["image1.png", "image2.png", "image3.png"]})
+ds = va.data.DataStore(df)
+ds = ds.add_lmm(va.lmm.get_lmm("llava"))
+ds = ds.add_embedder(va.emb.get_embedder("sentence-transformer"))
+
+ds = ds.add_column("descriptions", "Describe this image.")
+```
+
+This will use the prompt you passed, "Describe this image.", and the LMM to create a
+new column of descriptions for your image. Your data will now contain a new column with
+the descriptions of each image:
+
+| image\_paths | image\_id | descriptions |
+| --- | --- | --- |
+| image1.png | 1 | "A yellow house with a green lawn." |
+| image2.png | 2 | "A white house with a two door garage." |
+| image3.png | 3 | "A wooden house in the middle of the forest." |
+
+You can now create an index on the descriptions column and search over it to find images
+that match your query.
+
+```python
+ds = ds.build_index("descriptions")
+ds.search("A yellow house.", top_k=1)
+>>> [{'image_paths': 'image1.png', 'image_id': 1, 'descriptions': 'A yellow house with a green lawn.'}]
+```
+
+You can also create other columns for you data such as `is_yellow`:
+
+```python
+ds = ds.add_column("is_yellow", "Is the house in this image yellow? Please answer yes or no.")
+```
+
+which would give you a dataset similar to this:
+
+| image\_paths | image\_id | descriptions | is\_yellow |
+| --- | --- | --- | --- |
+| image1.png | 1 | "A yellow house with a green lawn." | "yes" |
+| image2.png | 2 | "A white house with a two door garage." | "no" |
+| image3.png | 3 | "A wooden house in the middle of the forest." | "no" |
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -0,0 +1,43 @@
+site_name: Landing AI Vision Agent Library Documentation
+site_url: https://landing-ai.github.io/
+repo_url: https://github.com/landing-ai/vision-agent
+edit_uri: edit/main/docs/
+
+
+theme:
+ name: "material"
+ custom_dir: docs/_overrides
+ features:
+ - content.code.copy
+ - content.code.annotate
+ - content.action.edit
+
+plugins:
+ - mkdocstrings
+ - search
+
+markdown_extensions:
+ # Syntax highlight
+ - pymdownx.highlight:
+ anchor_linenums: true
+ line_spans: __span
+ pygments_lang_class: true
+ - pymdownx.inlinehilite
+ - pymdownx.snippets
+ - pymdownx.superfences
+
+ # Multiline note/warning/etc blocks (https://squidfunk.github.io/mkdocs-material/reference/admonitions)
+ - admonition
+ - pymdownx.details
+
+nav:
+ - Quick start: index.md
+ - APIs:
+ - vision_agent.agent: api/agent.md
+ - vision_agent.tools: api/tools.md
+ - vision_agent.llm: api/llm.md
+ - vision_agent.lmm: api/lmm.md
+ - vision_agent.data: api/data.md
+ - vision_agent.emb: api/emb.md
+ - vision_agent.image_utils: api/image_utils.md
+ - Old documentation: old.md