Skip to content

Commit

Permalink
Add createExtractFunction for data extraction
Browse files Browse the repository at this point in the history
Introduce `createExtractFunction` using OpenAI's structured output
feature, leveraging a Zod schema for data validation and extraction.

This is a better and more modern alternative to the `createAIExtractFunction`
utility, which is now deprecated.
  • Loading branch information
rileytomasek committed Oct 13, 2024
1 parent 3eb2fa8 commit f25feab
Show file tree
Hide file tree
Showing 6 changed files with 128 additions and 7 deletions.
11 changes: 5 additions & 6 deletions examples/extract-people-names.ts
Original file line number Diff line number Diff line change
@@ -1,15 +1,14 @@
import 'dotenv/config';

import { ChatModel } from '@dexaai/dexter';
import { createAIExtractFunction } from '@dexaai/dexter/ai-function';
import { createExtractFunction } from '@dexaai/dexter/extract';
import { z } from 'zod';

/** A function to extract people names from text. */
const extractPeopleNamesRunner = createAIExtractFunction({
chatModel: new ChatModel({ params: { model: 'gpt-4-1106-preview' } }),
systemMessage: `You use functions to extract people names from a message.`,
name: 'log_people_names',
description: `Use this to log the full names of people from a message. Don't include duplicate names.`,
const extractPeopleNamesRunner = createExtractFunction({
chatModel: new ChatModel({ params: { model: 'gpt-4o-mini' } }),
systemMessage: `You extract the names of people from unstructured text.`,
name: 'people_names',
schema: z.object({
names: z.array(
z
Expand Down
5 changes: 5 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@
"./ai-function": {
"types": "./dist/ai-function/index.d.ts",
"import": "./dist/ai-function/index.js"
},
"./extract": {
"types": "./dist/extract/index.d.ts",
"import": "./dist/extract/index.js"
}
},
"sideEffects": false,
Expand Down Expand Up @@ -50,6 +54,7 @@
"jsonrepair": "^3.8.1",
"ky": "^1.7.2",
"openai-fetch": "3.3.1",
"openai-zod-to-json-schema": "^1.0.2",
"p-map": "^7.0.2",
"p-throttle": "^6.2.0",
"parse-json": "^8.1.0",
Expand Down
13 changes: 13 additions & 0 deletions pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

53 changes: 52 additions & 1 deletion readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ Dexter is a powerful TypeScript library for working with Large Language Models (

- **Advanced AI Function Utilities**: Tools for creating and managing AI functions, including `createAIFunction`, `createAIExtractFunction`, and `createAIRunner`, with Zod integration for schema validation.

- **Structured Data Extraction**: Dexter supports OpenAI's structured output feature through the `createExtractFunction`, which uses the `response_format` parameter with a JSON schema derived from a Zod schema.

- **Flexible Caching and Tokenization**: Built-in caching system with custom cache support, and advanced tokenization based on `tiktoken` for accurate token management.

- **Robust Observability and Control**: Customizable telemetry system, comprehensive event hooks, and specialized error handling for enhanced monitoring and control.
Expand Down Expand Up @@ -103,10 +105,41 @@ async function main() {
main().catch(console.error);
```

### Extracting Structured Data

```typescript
import { ChatModel } from '@dexaai/dexter';
import { createExtractFunction } from '@dexaai/dexter/extract';
import { z } from 'zod';

const extractPeopleNames = createExtractFunction({
chatModel: new ChatModel({ params: { model: 'gpt-4o-mini' } }),
systemMessage: `You extract the names of people from unstructured text.`,
name: 'people_names',
schema: z.object({
names: z.array(
z.string().describe(
`The name of a person from the message. Normalize the name by removing suffixes, prefixes, and fixing capitalization`
)
),
}),
});

async function main() {
const peopleNames = await extractPeopleNames(
`Dr. Andrew Huberman interviewed Tony Hawk, an idol of Andrew Hubermans.`
);
console.log('peopleNames', peopleNames);
}

main().catch(console.error);
```

### Using AI Functions

```typescript
import { ChatModel, createAIFunction, MsgUtil } from '@dexaai/dexter';
import { ChatModel, MsgUtil } from '@dexaai/dexter';
import { createAIFunction } from '@dexaai/dexter/ai-function';
import { z } from 'zod';

const getWeather = createAIFunction(
Expand Down Expand Up @@ -271,6 +304,22 @@ new SparseVectorModel(args: SparseVectorModelArgs<CustomCtx>)
- `extend(args?: PartialSparseVectorModelArgs<CustomCtx>): SparseVectorModel<CustomCtx>`
- Creates a new instance of the model with modified configuration

### Extract Functions

#### createExtractFunction

Creates a function to extract structured data from text using OpenAI's structured output feature.
This is a better way to extract structured data than using the legacy `createAIExtractFunction` function.

```typescript
createExtractFunction<Schema extends z.ZodObject<any>>(args: {
chatModel: Model.Chat.Model;
name: string;
schema: Schema;
systemMessage: string;
}): (input: string | Msg) => Promise<z.infer<Schema>>
```

### AI Functions

#### createAIFunction
Expand Down Expand Up @@ -375,6 +424,8 @@ Dexter uses the `openai-fetch` library to interact with the OpenAI API. This cli
4. **Streaming Support**: The `openai-fetch` client supports streaming responses, which Dexter utilizes for real-time output in chat models.
5. **Structured Output**: Dexter supports OpenAI's structured output feature through the `createExtractFunction`, which uses the `response_format` parameter with a JSON schema derived from a Zod schema.
### Message Types and MsgUtil
Dexter defines a set of message types (`Msg`) that closely align with the OpenAI API's message formats but with some enhancements for better type safety and easier handling. The `MsgUtil` class provides methods for creating, checking, and asserting these message types.
Expand Down
1 change: 1 addition & 0 deletions src/ai-function/ai-extract-function.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import { type ExtractFunction } from './types.js';

/**
* Use OpenAI function calling to extract data from a message.
* @deprecated Use `createExtractFunction()` from `@dexaai/dexter/extract` instead.
*/
export function createAIExtractFunction<Schema extends z.ZodObject<any>>(
{
Expand Down
52 changes: 52 additions & 0 deletions src/extract/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import { zodToJsonSchema } from 'openai-zod-to-json-schema';
import { type z } from 'zod';

import { type Model, type Msg, MsgUtil } from '../model/index.js';

/**
* Extract data using OpenAI structured outputs.
*
* Always returns an object satisfying the provided Zod schema.
*/
export function createExtractFunction<Schema extends z.ZodObject<any>>(args: {
/** The ChatModel used to make API calls. */
chatModel: Model.Chat.Model;
/** A descriptive name for the object to extract. */
name: string;
/** The Zod schema for the data to extract. */
schema: Schema;
/** Add a system message to the beginning of the messages array. */
systemMessage: string;
}): (input: string | Msg) => Promise<z.infer<Schema>> {
const { chatModel, schema, systemMessage } = args;

async function runExtract(input: string | Msg): Promise<z.infer<Schema>> {
const inputVal = typeof input === 'string' ? input : (input.content ?? '');
const messages: Msg[] = [
MsgUtil.system(systemMessage),
MsgUtil.user(inputVal),
];

const { message } = await chatModel.run({
messages,
response_format: {
type: 'json_schema',
json_schema: {
name: 'TODO',
strict: true,
schema: zodToJsonSchema(schema, {
$refStrategy: 'none',
openaiStrictMode: true,
}),
},
},
});

MsgUtil.assertAssistant(message);

const json = JSON.parse(message.content);
return schema.parse(json);
}

return runExtract;
}

0 comments on commit f25feab

Please sign in to comment.