Skip to content

Commit ab94101

Browse files
authored
Merge pull request #1021 from IBM/replicate
DPK LLM Agent
2 parents 96e78be + 3f04bba commit ab94101

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+6034
-0
lines changed
Lines changed: 276 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"<div style=\"background-color: #04D7FD; padding: 20px; text-align: left;\">\n",
8+
" <h1 style=\"color: #000000; font-size: 30px; margin: 0;\">data-prep-kit planning agent</h1> \n",
9+
"</div>"
10+
]
11+
},
12+
{
13+
"cell_type": "code",
14+
"execution_count": null,
15+
"metadata": {},
16+
"outputs": [],
17+
"source": [
18+
"%pip install -qq -r requirements.txt"
19+
]
20+
},
21+
{
22+
"cell_type": "code",
23+
"execution_count": null,
24+
"metadata": {},
25+
"outputs": [],
26+
"source": [
27+
"from IPython.display import HTML\n",
28+
"task = \"Process the provided PDF dataset to identify and extract only documents that don't contain inappropriate language. Remove the duplications.\"\n",
29+
"HTML(f\"<p><span style='color:blue; font-weight:bold; font-size:14.0pt;'>TASK: {task}</span></p>\")"
30+
]
31+
},
32+
{
33+
"cell_type": "code",
34+
"execution_count": null,
35+
"metadata": {},
36+
"outputs": [],
37+
"source": [
38+
"import logging\n",
39+
"import os\n",
40+
"\n",
41+
"from llm_utils.logging import prep_loggers\n",
42+
"os.environ[\"LLM_LOG_PATH\"] = \"./logs/llm_log.txt\"\n",
43+
"prep_loggers(\"llm=INFO\")"
44+
]
45+
},
46+
{
47+
"cell_type": "code",
48+
"execution_count": null,
49+
"metadata": {},
50+
"outputs": [],
51+
"source": [
52+
"# The tools in DPK agents are the transforms.\n",
53+
"# Each tool is described as json dictionary with its name, description, input parameters, and how to import it.\n",
54+
"# The list of the tools exists in llm_utils/tools.py file.\n",
55+
"from llm_utils.dpk.tools import *\n",
56+
"print(tools_json)"
57+
]
58+
},
59+
{
60+
"cell_type": "code",
61+
"execution_count": null,
62+
"metadata": {},
63+
"outputs": [],
64+
"source": [
65+
"# This is an example of a plan for a simple task. It is possed to the prompt to enhance the planning results.\n",
66+
"from llm_utils.dpk.examples import *\n",
67+
"print(example_task)"
68+
]
69+
},
70+
{
71+
"cell_type": "code",
72+
"execution_count": null,
73+
"metadata": {},
74+
"outputs": [],
75+
"source": [
76+
"# This is a string that contains several constraints on the order of the tools in the plan.\n",
77+
"# It is a free text and can be found in llm_utils/constraints.py file.\n",
78+
"from llm_utils.dpk.constraints import *\n",
79+
"print(constraints)"
80+
]
81+
},
82+
{
83+
"cell_type": "markdown",
84+
"metadata": {},
85+
"source": [
86+
"## Define LLM models\n",
87+
"\n",
88+
"We have have tested our project with the following LLM execution frameworks: [Watsonx](https://www.ibm.com/watsonx), [Replicate](https://replicate.com/), and locally running [Ollama](https://ollama.com/).\n",
89+
"To use one of the frameworks uncomment its part in the cell below while commenting out the other frameworks.\n",
90+
"Please note that the notebooks have been tested with specific Large Language Models (LLMs) that are mentioned in the cell, and due to the inherent nature of LLMs, using a different model may not produce the same results.\n",
91+
"\n",
92+
"- To use Replicate:\n",
93+
" - Obtain Replicate API token\n",
94+
" - Store the following value in the `.env` file located in your project directory:\n",
95+
" ```\n",
96+
" REPLICATE_API_TOKEN=<your Replicate API token>\n",
97+
" ```\n",
98+
"- To use Ollama: \n",
99+
" - Download [Ollama](https://ollama.com/download).\n",
100+
" - Download one of the supported [models](https://ollama.com/search). We tested with `llama3.3` model.\n",
101+
" - update the `model_ollama_*` names if needed.\n",
102+
"- To use Watsonx:\n",
103+
" - Register for Watsonx\n",
104+
" - Obtain its API key\n",
105+
" - Store the following values in the `.env` file located in your project directory:\n",
106+
" ```\n",
107+
" WATSONX_URL=<WatsonX entry point, e.g. https://us-south.ml.cloud.ibm.com>\n",
108+
" WATSON_PROJECT_ID=<your Watsonx project ID>\n",
109+
" WATSONX_APIKEY=<your Watsonx API key>\n",
110+
" ```"
111+
]
112+
},
113+
{
114+
"cell_type": "code",
115+
"execution_count": null,
116+
"metadata": {},
117+
"outputs": [],
118+
"source": [
119+
"from llm_utils.models import getChatLLM\n",
120+
"from dotenv import dotenv_values\n",
121+
"\n",
122+
"# watsonx part \n",
123+
"# config = dotenv_values(\"./.env\")\n",
124+
"# model_watsonx_id1 = \"ibm-granite/granite-3.1-8b-instruct\"\n",
125+
"# model_watsonx_id2 = \"meta-llama/llama-3-1-70b-instruct\"\n",
126+
"# model_watsonx_id3 = \"meta-llama/llama-3-3-70b-instruct\"\n",
127+
"# model_watsonx_id4 = \"ibm/granite-34b-code-instruct\"\n",
128+
"\n",
129+
"# llm_plan = getChatLLM(\"watsonx\", model_watsonx_id2, config)\n",
130+
"# llm_judge = getChatLLM(\"watsonx\", model_watsonx_id2, config)\n",
131+
"# llm_generate = getChatLLM(\"watsonx\", model_watsonx_id2, config)\n",
132+
"\n",
133+
"# # ollama part\n",
134+
"# model_ollama = \\\"llama3.3\\\"\\n\",\n",
135+
"# llm_plan = getChatLLM(\\\"ollama\\\", model_ollama);\\n\",\n",
136+
"# llm_judge = getChatLLM(\\\"ollama\\\", model_ollama)\\n\",\n",
137+
"# llm_generate = getChatLLM(\\\"ollama\\\", model_ollama)\"\n",
138+
"\n",
139+
"# replicate part\n",
140+
"config = dotenv_values(\"./.env\")\n",
141+
"# You can use different llm models\n",
142+
"model_replicate_id1 = \"meta/meta-llama-3-70b-instruct\"\n",
143+
"llm_plan = getChatLLM(\"replicate\", model_replicate_id1, config)\n",
144+
"llm_judge = getChatLLM(\"replicate\", model_replicate_id1, config)\n",
145+
"llm_generate = getChatLLM(\"replicate\", model_replicate_id1, config)"
146+
]
147+
},
148+
{
149+
"cell_type": "code",
150+
"execution_count": null,
151+
"metadata": {},
152+
"outputs": [],
153+
"source": [
154+
"from langgraph.graph import StateGraph, END\n",
155+
"from llm_utils.agent_helpers import *\n",
156+
"from llm_utils.prompts.planner_prompt import *\n",
157+
"from llm_utils.prompts.judge_prompt import *\n",
158+
"from llm_utils.prompts.generate_prompt import *\n",
159+
"from llm_utils.dpk.tools import *\n",
160+
"from llm_utils.dpk.examples import *\n",
161+
"from llm_utils.dpk.constraints import *\n",
162+
"from functools import partial\n",
163+
"\n",
164+
"\n",
165+
"# Create the graph\n",
166+
"workflow = StateGraph(State)\n",
167+
"\n",
168+
"# Add nodes\n",
169+
"workflow.add_node(\"planner\", partial(planner, prompt=planner_prompt_str, tools=tools_json, example=example_task1, context=constraints, llm=llm_plan))\n",
170+
"workflow.add_node(\"judge\", partial(judge, prompt=judge_prompt_str_dpk, tools=tools_json, context=constraints, llm=llm_judge))\n",
171+
"workflow.add_node(\"user_review\", get_user_review)\n",
172+
"workflow.add_node(\"code generator\", partial(generator, prompt=generate_prompt_str_with_example, llm=llm_generate))\n",
173+
"workflow.add_node(\"code validator\", code_validator_noop)\n",
174+
"\n",
175+
"# Add edges\n",
176+
"workflow.set_entry_point(\"planner\")\n",
177+
"workflow.add_edge(\"code generator\", \"code validator\")\n",
178+
"workflow.add_edge(\"code validator\", END)\n",
179+
"\n",
180+
"# Add conditional edges from judge\n",
181+
"workflow.add_conditional_edges(\n",
182+
" \"judge\",\n",
183+
" is_plan_OK,\n",
184+
" {\n",
185+
" False: \"planner\", # If needs revision, go back to planner\n",
186+
" True: \"user_review\" # If plan is good, proceed to user review\n",
187+
" }\n",
188+
")\n",
189+
"\n",
190+
"# Add conditional edges from planner\n",
191+
"workflow.add_conditional_edges(\n",
192+
" \"planner\",\n",
193+
" need_judge,\n",
194+
" {\n",
195+
" True: \"judge\", # If needs revision, go back to planner\n",
196+
" False: \"user_review\" # If plan is good, proceed to user review\n",
197+
" }\n",
198+
")\n",
199+
"\n",
200+
"workflow.add_conditional_edges(\n",
201+
" \"user_review\",\n",
202+
" is_user_review_OK,\n",
203+
" {\n",
204+
" False: \"planner\", # If needs revision, go back to planner\n",
205+
" True: \"code generator\",\n",
206+
" }\n",
207+
")\n"
208+
]
209+
},
210+
{
211+
"cell_type": "code",
212+
"execution_count": null,
213+
"metadata": {},
214+
"outputs": [],
215+
"source": [
216+
"app = workflow.compile()\n",
217+
"\n",
218+
"from IPython.display import Image, display\n",
219+
"\n",
220+
"#display(Image(app.get_graph(xray=True).draw_mermaid_png()))\n",
221+
"display(Image(app.get_graph().draw_mermaid_png()))"
222+
]
223+
},
224+
{
225+
"cell_type": "code",
226+
"execution_count": null,
227+
"metadata": {},
228+
"outputs": [],
229+
"source": [
230+
"# Run the graph\n",
231+
"initial_state = {\n",
232+
" \"task\": task,\n",
233+
" \"context\": \"\",\n",
234+
" \"plan\": [\"still no plan\"],\n",
235+
" \"planning_attempts\": 0,\n",
236+
" \"feedback\": \"Still no review\",\n",
237+
" \"needs_revision\": \"\",\n",
238+
" \"need_judge\": True,\n",
239+
"}\n",
240+
"\n",
241+
"state = initial_state\n",
242+
"\n",
243+
"for output in app.stream(state):\n",
244+
" pass"
245+
]
246+
},
247+
{
248+
"cell_type": "code",
249+
"execution_count": null,
250+
"metadata": {},
251+
"outputs": [],
252+
"source": []
253+
}
254+
],
255+
"metadata": {
256+
"kernelspec": {
257+
"display_name": "Python 3 (ipykernel)",
258+
"language": "python",
259+
"name": "python3"
260+
},
261+
"language_info": {
262+
"codemirror_mode": {
263+
"name": "ipython",
264+
"version": 3
265+
},
266+
"file_extension": ".py",
267+
"mimetype": "text/x-python",
268+
"name": "python",
269+
"nbconvert_exporter": "python",
270+
"pygments_lexer": "ipython3",
271+
"version": "3.11.9"
272+
}
273+
},
274+
"nbformat": 4,
275+
"nbformat_minor": 4
276+
}

examples/agentic/README.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Agentic Data Agent Experiments
2+
3+
## Table of Contents
4+
1. [Project Overview](#project-overview)
5+
2. [Installation Guide](#installation-guide)
6+
3. [Usage](#usage)
7+
8+
9+
## Project Overview
10+
11+
This project focuses on automating the integration of Large Language Models (LLM) based workflow in the data access.
12+
It contains the following notebooks:
13+
14+
- [Planning_DPK_agent.ipynb](Planning_DPK_agent.ipynb): Planner for Data-Prep-Kit tasks with code generation. This notebook enables the data engineer (or data user) to efficiently build and run pipelines that performs required tasks defined by a natural language. It includes a langgraph LLM agent that has several components like planner, judge, and code generator. This agent can generate as a result a python code of a DPK pipeline which can be run by the user from command line.
15+
16+
- [dpk_as_tools.ipynb](dpk_as_tools.ipynb): Use DPK transforms defined as [langchain tools](https://python.langchain.com/v0.1/docs/modules/tools/) or [llama-index tools](https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/tools/).
17+
This notebook leverages LLM to generate a DPK transforms pipeline based on natural language inputs.
18+
The LLM processes the provided input and produces the pipeline in the correct format, making it ready for execution.
19+
Subsequently, each transform in the pipeline is invoked by calling its lang-chain or llama-index implementations.
20+
21+
22+
## Before you begin
23+
24+
Ensure that you have python 3.11
25+
26+
## Installation Guide
27+
28+
1. Clone the repository:
29+
```bash
30+
git clone [email protected]:IBM/data-prep-kit.git
31+
cd examples/agentic
32+
```
33+
34+
2. Create Python virtual environment:
35+
```bash
36+
python -m venv venv
37+
source venv/bin/activate
38+
pip install --upgrade pip
39+
pip install jupyter
40+
pip install ipython && pip install ipykernel
41+
pip install -r requirements.txt
42+
```
43+
44+
3. Configure access to LLM:
45+
46+
We have have tested our project with the following LLM execution frameworks:
47+
- [Replicate](https://replicate.com/)
48+
- [Watsonx](https://www.ibm.com/watsonx)
49+
- locally running [Ollama](https://ollama.com/) (on mac)
50+
51+
3.1 Setup Instructions for each framework:
52+
53+
The notebook cell that defines the models contains all frameworks with only the `replicate` part uncomment. To use one of the other frameworks uncomment its part in the cell while commenting out the other frameworks. Please note that the frameworks have been tested with a specific LLM and due to the inherent nature of LLMs, using a different model may not produce the same results.
54+
55+
- Replicate:
56+
- Obtain Replicate API token
57+
- Store the following value in the `.env` file located in your project directory:
58+
```
59+
REPLICATE_API_TOKEN=<your Replicate API token>
60+
```
61+
- Ollama:
62+
- Download [Ollama](https://ollama.com/download).
63+
- Download one of the supported [models](https://ollama.com/search). We tested with `llama3.3` model.
64+
- update the `model_ollama_*` names in the relevant cells if needed.
65+
- Watsonx:
66+
- Register for Watsonx
67+
- Obtain its API key
68+
- Store the following values in the `.env` file located in your project directory:
69+
```
70+
WATSONX_URL=<WatsonX entry point, e.g. https://us-south.ml.cloud.ibm.com>
71+
WATSON_PROJECT_ID=<your Watsonx project ID>
72+
WATSONX_APIKEY=<your Watsonx API key>
73+
```
74+
75+
## Usage
76+
77+
To launch the notebooks, execute the following command in your terminal:
78+
```bash
79+
Jupyter notebook
80+
```
81+
82+
Once the Jupyter interface is loaded, select the desired notebook to begin working with it.

examples/agentic/dpk-requirements.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
data-prep-toolkit==0.2.3
2+
data-prep-toolkit-transforms[all,ray]==1.0.0a2
3+
deepsearch-toolkit

0 commit comments

Comments
 (0)