Skip to content

Commit

Permalink
🔧 chore: update readme fix typos old parameters etc
Browse files Browse the repository at this point in the history
  • Loading branch information
firattamur committed Mar 19, 2024
1 parent ac535af commit 01a3c2c
Showing 1 changed file with 160 additions and 81 deletions.
241 changes: 160 additions & 81 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,38 @@
<br>

<p align="center">
<picture>
<img src="https://raw.githubusercontent.com/firattamur/llmdantic/main/.github/assets/llmdantic.png" alt="image" width="300">
</picture>
<picture>
<img src="https://raw.githubusercontent.com/firattamur/llmdantic/main/.github/assets/llmdantic.png" alt="image" width="300">
</picture>
</p>

<h3 style="font-size: 5em" align="center">
Structured Output Is All You Need!
Structured Output Is All You Need!
</h3>

<br>

LLMdantic redefines the integration of Large Language Models (LLMs) into your projects, offering a seamless, efficient,
and powerful way to work with the latest advancements in AI. By abstracting the complexities of LLMs, llmdantic allows
developers to focus on what truly matters: building innovative applications.

* **Ease of Use.** Simplify your LLM interactions. Forget about the intricacies of prompts and models; define your
requirements and let llmdantic handle the rest.
* **Data Integrity.** With Pydantic, define input and output models that ensure your data aligns perfectly with your
requirements, maintaining structure and validation at every step.
* **Modular and Extensible.** Easily switch between different LLMs and customize your experience with the modular and
extensible framework provided by llmdantic.
* **Cost Tracking.** Keep track of your LLM usage and costs, ensuring you stay within budget and optimize your usage.
* **Batch Processing.** Process multiple data points in a single call, streamlining your operations and enhancing
efficiency.
* **Retry Mechanism.** Automatically retry failed requests, ensuring you get the results you need without any hassle.
LLMdantic is a powerful and efficient Python library that simplifies the integration of Large Language Models (LLMs) into your projects. Built on top of the incredible [Langchain](https://github.com/hwchase17/langchain) package and leveraging the power of [Pydantic](https://github.com/pydantic/pydantic) models, LLMdantic provides a seamless and structured approach to working with LLMs.

## Features 🚀

- 🌐 Wide range of LLM support through Langchain integrations
- 🛡️ Ensures data integrity with Pydantic models for input and output validation
- 🧩 Modular and extensible design for easy customization
- 💰 Cost tracking and optimization for OpenAI models
- 🚀 Efficient batch processing for handling multiple data points
- 🔄 Robust retry mechanism for smooth and uninterrupted experience

## Getting Started 🌟

### Requirements

Before using LLMdantic, make sure you have set the required API keys for the LLMs you plan to use. For example, if you're using OpenAI's models, set the `OPENAI_API_KEY` environment variable:

```bash
export OPENAI_API_KEY="your-api-key"
```

If you're using other LLMs, follow the instructions provided by the respective providers in Langchain's documentation.

### Installation

```bash
Expand All @@ -37,134 +41,209 @@ pip install llmdantic

### Usage

1. **Define Your Models**
#### 1. Define input and output schemas using Pydantic:

- Use Pydantic to define input and output models with custom validation rules.

> [!IMPORTANT]
>
> Add docstrings to validation rules to provide prompts for the LLM. This will help the LLM understand the validation rules and provide better results
- **inp_model**: Define the structure of the data you want to process.
- **out_model**: Define the structure of the data you expect to receive.
- Use Pydantic to define your models and add custom validation rules.
- Custom validation rules are used to ensure the integrity and quality of your data.
- Add docstrings to your custom validation rules to provide prompts for the LLM.

```python
from pydantic import BaseModel, field_validator


class SummarizeInput(BaseModel):
text: str


class SummarizeOutput(BaseModel):
summary: str

@field_validator("summary")
@field_validator("summary")
def summary_must_not_be_empty(cls, v) -> bool:
"""Summary cannot be empty""" # Add docstring that explains the validation rule. This will be used as a prompt for the LLM
"""Summary cannot be empty""" # Add docstring that explains the validation rule. This will be used as a prompt for the LLM.
if not v.strip():
raise
return v

@field_validator("summary")
def summary_must_be_short(cls, v) -> bool:
"""Summary must be less than 100 words""" # Add docstring that explains the validation rule. This will be used as a prompt for the LLM
def summary_must_be_short(cls, v) -> bool:
"""Summary must be less than 100 words""" # Add docstring that explains the validation rule. This will be used as a prompt for the LLM.
if len(v.split()) > 100:
raise
raise
return v
```

2. **Initialize LLMdantic**
#### 2. Create an LLMdantic client:

- Initialize **LLMdantic** with your input and output models.
- Also, provide a objective for the LLM to understand the task.
- Provide input and output models, objective, and configuration.

> [!TIP]
>
> The `objective` is a prompt that will be used to generate the actual prompt sent to the LLM. It should be a high-level description of the task you want the LLM to perform.
>
> The `inp_schema` and `out_schema` are the input and output models you defined in the previous step.
>
> The `retries` parameter is the number of times the LLMdantic will retry the request in case of failure.
```python
from llmdantic import LLMdantic, LLMdanticConfig
from langchain_openai import OpenAI
from langchain.llms.base import BaseLanguageModel
from llmdantic import LLMdantic, LLMdanticConfig
from langchain_openai import ChatOpenAI

llm: BaseLanguageModel = OpenAI()
llm = ChatOpenAI()

config: LLMdanticConfig = LLMdanticConfig(
objective="Summarize the text",
inp_model=SummarizeInput,
out_model=SummarizeOutput,
retries=3
objective="Summarize the text",
inp_schema=SummarizeInput,
out_schema=SummarizeOutput,
retries=3,
)

llmdantic = LLMdantic(
llm=llm,
config=config
)
llmdantic = LLMdantic(llm=llm, config=config)
```

3. **Process Your Data**
Here's the prompt template generated based on the input and output models:

```text
Objective: Summarize the text
- Use the `invoke` or `batch` method to process your data.
Input 'SummarizeInput':
{input}
- `invoke` returns an instance of `LLMdanticResult` which contains:
- `text`: The output of the LLM.
- `output`: The output model with the processed data.
- `retry_count`: The number of retries made to get the result.
- `cost`: The cost of the request.
- `inp_tokens`: The number of tokens used for the input.
- `out_tokens`: The number of tokens used for the output.
- `successful_requests`: The number of successful requests made.
Output 'SummarizeOutput''s fields MUST FOLLOW the RULES:
SummarizeOutput.summary:
• SUMMARY CANNOT BE EMPTY
• SUMMARY MUST BE LESS THAN 100 WORDS
- `batch` returns a list of `LLMdanticResult` for each input data.
{format_instructions}
```

#### 3. Generate output using the LLMdantic:

> [!TIP]
>
> The `invoke` method is used for single requests, while the `batch` method is used for batch processing.
>
> The `invoke` method returns an instance of `LLMdanticResult`, which contains the generated text, parsed output, and other useful information such as cost and usage stats such as the number of input and output tokens. Check out the [LLMdanticResult](#LLMdanticResult) model for more details.
>
```python
from llmdantic import LLMdanticResult

data: SummarizeInput = SummarizeInput(text="A long article about natural language processing...")
data = SummarizeInput(text="A long article about natural language processing...")
result: LLMdanticResult = llmdantic.invoke(data)

if result.output:
print(result.output.summary)
output: Optional[SummarizeOutput] = result.output

if output:
print(output.summary)
```

Here's the actual prompt sent to the LLM based on the input data:

```text
Objective: Summarize the text
Input 'SummarizeInput':
{'text': 'A long article about natural language processing...'}
Output 'SummarizeOutput''s fields MUST FOLLOW the RULES:
SummarizeOutput.summary:
• SUMMARY CANNOT BE EMPTY
• SUMMARY MUST BE LESS THAN 100 WORDS
The output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.
Here is the output schema:
{"properties": {"summary": {"title": "Summary", "type": "string"}}, "required": ["summary"]}
```

- For batch processing, pass a list of input data.

> [!IMPORTANT]
>
> The `batch` method returns a list of `LLMdanticResult` instances, each containing the generated text, parsed output, and other useful information such as cost and usage stats such as the number of input and output tokens. Check out the [LLMdanticResult](#LLMdanticResult) model for more details.
>
> The `concurrency` parameter is the number of concurrent requests to be made. Please check the usage limits of the LLM provider before setting this value.
>
```python
data: List[SummarizeInput] = [
SummarizeInput(text="A long article about natural language processing..."),
SummarizeInput(text="A long article about computer vision...")
SummarizeInput(text="A long article about computer vision...")
]
results: List[Optional[SummarizeOutput]] = llmdantic.batch(data)
results: List[LLMdanticResult] = llmdantic.batch(data, concurrency=2)

for result in results:
if result:
print(result.summary)
if result.output:
print(result.output.summary)
```

4. **Track Costs and Stats**:
#### 4. Monitor usage and costs:

> [!IMPORTANT]
>
> The cost tracking feature is currently available for OpenAI models only.
>
> The `usage` attribute returns an instance of `LLMdanticUsage`, which contains the number of input and output tokens, successful requests, cost, and successful outputs. Check out the [LLMdanticUsage](#LLMdanticUsage) model for more details.
>
> Please note that the usage is tracked for the entire lifetime of the `LLMdantic` instance.
- Use the `cost` attribute of the `LLMdanticResult` to track the cost of the request.
- Use the `usage` attribute of the `LLMdantic` to track the usage stats overall.
- Use the `cost` attribute of the LLMdanticResult to track the cost of the request (currently available for OpenAI models).

- Use the `usage` attribute of the LLMdantic to track the usage stats overall.

```python
from llmdantic import LLMdanticResult

data: SummarizeInput = SummarizeInput(text="A long article about natural language processing...")
data: SummarizeInput = SummarizeInput(text="A long article about natural language processing...")
result: LLMdanticResult = llmdantic.invoke(data)

if result.output:
print(result.output.summary)

# Track the cost of the request
print(f"Cost: {result.cost}")
# Track the cost of the request (OpenAI models only)
print(f"Cost: {result.cost}")

# Track the usage stats
print(f"Usage: {llmdantic.usage}")
```

```bash
Cost: 0.0003665
Overall Usage: LLMdanticUsage(
inp_tokens=219,
out_tokens=19,
total_tokens=238,
successful_requests=1,
cost=0.000367,
successful_outputs=1
)
```

## Advanced Usage 🛠

**LLMdantic** is built on top of the `langchain` package, which provides a modular and extensible framework for working
with LLMs. You can easily switch between different LLMs and customize your experience.
`LLMdantic` is built on top of the langchain package, which provides a modular and extensible framework for working with LLMs. You can easily switch between different LLMs and customize your experience.

Switching LLMs

### Switching LLMs:
> [!IMPORTANT]
>
> Make sure to set the required API keys for the new LLM you plan to use.
>
> The `llm` parameter of the `LLMdantic` class should be an instance of `BaseLanguageModel` from the langchain package.
>
> [!TIP]
>
> You can use the `langchain_community` package to access a wide range of LLMs from different providers.
>
> You may need to provide model_name, api_key, and other parameters based on the LLM you want to use. Check out the documentation of the respective LLM provider for more details.
>
- **LLMdantic** uses the `OpenAI` LLM by default. You can switch to a different LLM by providing an instance of the
desired LLM.

```python
from llmdantic import LLMdantic, LLMdanticConfig
Expand All @@ -175,16 +254,15 @@ llm: BaseLanguageModel = Ollama()

config: LLMdanticConfig = LLMdanticConfig(
objective="Summarize the text",
inp_model=SummarizeInput,
out_model=SummarizeOutput,
retries=3
inp_schema=SummarizeInput,
out_schema=SummarizeOutput,
retries=3,
)

llmdantic = LLMdantic(
llm=llm,
config=config
)

```

## Contributing 🤝
Expand All @@ -195,3 +273,4 @@ Contributions are welcome! Whether you're fixing bugs, adding new features, or i
## License 📄

**LLMdantic** is released under the [MIT License](LICENSE). Feel free to use it, contribute, and spread the word!

0 comments on commit 01a3c2c

Please sign in to comment.