Update conversation (#217)

* update for new conv * add artifact tools * update local executor * fix upload/download * cleaned up code for artifacts * starting artifact prompts * app to add files to artifacts * add support for artifacts * add artifact meta tools * ran isort * prompt to work with artifacts * minor fixes for prompts * add docs, fix load and saving remote files * rename prompts * add docs for artifacts, allow None artifacts (which don't load) to be added * e2b and local uplaod/download work similarly now, can pass in target download path * add Artifacts to exports * local chat app to work with artifacts * updated docs * fix flake8 * fix mypy errors * fix format * add execution to conversation * fixed type errors * fixed bug with upload file * added ability to write media files to artifacts * return outside of context * make remote path execute variable * add codec for video encoding * fix prompts to include writing media artifacts * isort * fix typo * added redisplay for nested notebook sessions * return artifacts * add trace for last edited artifact * handle artifact return * only add text to obs, no trace
landing-ai · Aug 30, 2024 · cacae44 · cacae44
1 parent 2842fdc
commit cacae44
Show file tree

Hide file tree

Showing 16 changed files with 560 additions and 401 deletions.
diff --git a/README.md b/README.md
@@ -41,15 +41,15 @@ export OPENAI_API_KEY="your-api-key"
 ```
 
 ### Vision Agent
-There are two agents that you can use. Vision Agent is a conversational agent that has
+There are two agents that you can use. `VisionAgent` is a conversational agent that has
 access to tools that allow it to write an navigate python code and file systems. It can
-converse with the user in natural language. VisionAgentCoder is an agent that can write
-code for vision tasks, such as counting people in an image. However, it cannot converse
-and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
-code.
+converse with the user in natural language. `VisionAgentCoder` is an agent specifically
+for writing code for vision tasks, such as counting people in an image. However, it
+cannot chat with you and can only respond with code. `VisionAgent` can call
+`VisionAgentCoder` to write vision code.
 
 #### Basic Usage
-To run the streamlit app locally to chat with Vision Agent, you can run the following
+To run the streamlit app locally to chat with `VisionAgent`, you can run the following
 command:
 
 ```bash
@@ -146,7 +146,7 @@ the code and having it update. You just need to add the code as a response from
 assistant:
 
 ```python
-agent = va.agent.VisionAgent(verbosity=2)
+agent = va.agent.VisionAgentCoder(verbosity=2)
 conv = [
  {
  "role": "user",
@@ -212,6 +212,10 @@ function. Make sure the documentation is in the same format above with descripti
 `Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
 [here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.
 
+Can't find the tool you need and want add it to `VisionAgent`? Check out our
+[vision-agent-tools](https://github.com/landing-ai/vision-agent-tools) repository where
+we add the source code for all the tools used in `VisionAgent`.
+
 ## Additional Backends
 ### Ollama
 We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download

diff --git a/docs/index.md b/docs/index.md
@@ -38,15 +38,15 @@ export OPENAI_API_KEY="your-api-key"
 ```
 
 ### Vision Agent
-There are two agents that you can use. Vision Agent is a conversational agent that has
+There are two agents that you can use. `VisionAgent` is a conversational agent that has
 access to tools that allow it to write an navigate python code and file systems. It can
-converse with the user in natural language. VisionAgentCoder is an agent that can write
-code for vision tasks, such as counting people in an image. However, it cannot converse
-and can only respond with code. VisionAgent can call VisionAgentCoder to write vision
-code.
+converse with the user in natural language. `VisionAgentCoder` is an agent specifically
+for writing code for vision tasks, such as counting people in an image. However, it
+cannot chat with you and can only respond with code. `VisionAgent` can call
+`VisionAgentCoder` to write vision code.
 
 #### Basic Usage
-To run the streamlit app locally to chat with Vision Agent, you can run the following
+To run the streamlit app locally to chat with `VisionAgent`, you can run the following
 command:
 
 ```bash
@@ -143,7 +143,7 @@ the code and having it update. You just need to add the code as a response from
 assistant:
 
 ```python
-agent = va.agent.VisionAgent(verbosity=2)
+agent = va.agent.VisionAgentCoder(verbosity=2)
 conv = [
  {
  "role": "user",
@@ -209,6 +209,10 @@ function. Make sure the documentation is in the same format above with descripti
 `Parameters:`, `Returns:`, and `Example\n-------`. You can find an example use case
 [here](examples/custom_tools/) as this is what the agent uses to pick and use the tool.
 
+Can't find the tool you need and want add it to `VisionAgent`? Check out our
+[vision-agent-tools](https://github.com/landing-ai/vision-agent-tools) repository where
+we add the source code for all the tools used in `VisionAgent`.
+
 ## Additional Backends
 ### Ollama
 We also provide a `VisionAgentCoder` that uses Ollama. To get started you must download
@@ -230,6 +234,7 @@ tools. You can use it just like you would use `VisionAgentCoder`:
 >>> agent = va.agent.OllamaVisionAgentCoder()
 >>> agent("Count the apples in the image", media="apples.jpg")
 ```
+> WARNING: VisionAgent doesn't work well unless the underlying LMM is sufficiently powerful. Do not expect good results or even working code with smaller models like Llama 3.1 8B.
 
 ### Azure OpenAI
 We also provide a `AzureVisionAgentCoder` that uses Azure OpenAI models. To get started
@@ -241,7 +246,7 @@ follow the Azure Setup section below. You can use it just like you would use=
 >>> agent = va.agent.AzureVisionAgentCoder()
 >>> agent("Count the apples in the image", media="apples.jpg")
 ```
-> WARNING: VisionAgent doesn't work well unless the underlying LMM is sufficiently powerful. Do not expect good results or even working code with smaller models like Llama 3.1 8B.
+
 
 ### Azure Setup
 If you want to use Azure OpenAI models, you need to have two OpenAI model deployments:

diff --git a/examples/chat/app.py b/examples/chat/app.py
@@ -26,7 +26,14 @@
  "response": "saved",
  "style": {"bottom": "calc(50% - 4.25rem", "right": "0.4rem"},
 }
-agent = va.agent.VisionAgent(verbosity=1)
+# set artifacts remote_path to WORKSPACE
+artifacts = va.tools.Artifacts(WORKSPACE / "artifacts.pkl")
+if Path("artifacts.pkl").exists():
+ artifacts.load("artifacts.pkl")
+else:
+ artifacts.save("artifacts.pkl")
+
+agent = va.agent.VisionAgent(verbosity=1, local_artifacts_path="artifacts.pkl")
 
 st.set_page_config(layout="wide")
 
@@ -44,7 +51,9 @@
 
 
 def update_messages(messages, lock):
- new_chat = agent.chat_with_code(messages)
+ if Path("artifacts.pkl").exists():
+ artifacts.load("artifacts.pkl")
+ new_chat, _ = agent.chat_with_code(messages, artifacts=artifacts)
  with lock:
  for new_message in new_chat:
  if new_message not in messages:
@@ -122,6 +131,9 @@ def main():
  with open(WORKSPACE / uploaded_file.name, "wb") as f:
  f.write(uploaded_file.getbuffer())
 
+ # make it None so it wont load and overwrite the image
+ artifacts.artifacts[uploaded_file.name] = None
+
  for file in WORKSPACE.iterdir():
  if "__pycache__" not in str(file) and not str(file).startswith("."):
  if st.button(file.name):

diff --git a/vision_agent/agent/agent.py b/vision_agent/agent/agent.py
@@ -11,7 +11,7 @@ def __call__(
  self,
  input: Union[str, List[Message]],
  media: Optional[Union[str, Path]] = None,
- ) -> str:
+ ) -> Union[str, List[Message]]:
  pass
 
  @abstractmethod