Replies: 8 comments 1 reply
-
We should distinguish between hermetic user-defined functions and non-hermetic ones: we use validating IDs against a DB as an example, but the proposal doesn't add support for this. Other issues I see:
|
Beta Was this translation helpful? Give feedback.
-
the validation error should probably include the parsed payload as well either way. |
Beta Was this translation helpful? Give feedback.
-
Will a stream be cancelled if there's a validation error? Or do we fulfill it to completion? |
Beta Was this translation helpful? Give feedback.
-
Needs a section on reusing constraints -- is there some way in the future we could use variables? Or a type alias with a constraint? |
Beta Was this translation helpful? Give feedback.
-
Use the keyword "scope" or "block" instead of super, to avoid misleading users to inheritance concepts. |
Beta Was this translation helpful? Give feedback.
-
in ts now,
|
Beta Was this translation helpful? Give feedback.
-
Needs more details on what a validation error looks like -- for BAMLErrors we expose have the raw llm response. Are we exposing the same thing + the parsed object? |
Beta Was this translation helpful? Give feedback.
-
this is now complete see: https://docs.boundaryml.com/guide/baml-advanced/validations |
Beta Was this translation helpful? Give feedback.
-
Stage
Validator / Type constraints
Problem
When defining a schema, often times, a type is insufficient to constrain it. Constraints like a regex match on a string, array length, contains, etc are all just as necessary. You may want to ensure an ID exists in your db before doing X or that a string is a substring of Y.
Currently, users have to do some post processing on the class to get validation errors in their native code, then they need to collect those errors and replumb it through the LLM to attempt to fix those errors. This problem can be amplified further when classes / enums have aliases and the raw string the user has is hard to connect back to the data model (for the user), to re-run through another prompt.
Solution
Some syntax in BAML that natively allows the user to apply JSON-schema-like constraints onto fields.
Types of constraints
As a general goal, we should attempt to keep the constraint system as flexible as possible, so its possible for users to add their own custom logic, and not be bottled on us shipping features. That means
custom function
is top priority in the design. Capabilities the user cannot trivially add are also high priority (regex, operators, etc).Note: Network based constraints are considered out of scope as BAML has no plans to support a socket interface.
Prior work
JSON Schema
/foo/bar
- used to point to a JSON object like jq)PKL
class Bird { name: String(length >= 3) parent: String(this != name) // notice it references another type } pigeon: Bird = new { name = "Pigeon" parent = "Pigeon Sr." } class Project { local emailAddress = (str) -> str.matches(Regex(#".+@.+"#)) // constrain the nullable type's element type type: String(contains("source"))? // constrain the map type and its key/value types contacts: Map<String(!isEmpty), String(emailAddress)>(length <= 5) } project: Project = new { type = "open-source" contacts = Map("Pigeon", "[email protected]") }
Note: With validators it may be very hard to understand how to navigate our docs. We would likely benefit from making a page per type, that outlines everything about it:
See that most things do this:
https://json-schema.org/understanding-json-schema/reference/string
Alternatively, like pkl we could do a single reference page which has everything in one place:
https://pkl-lang.org/main/current/language-reference/index.html#strings (cmd+f is much easier)
[recommended] Option 1: Attribute
We introduce a primitive attribute:
@constraint()
This takes in a jinja expr as a parameter where the word
this
is used to denote that property.keywords for expr:
this
- denotes teh current typesuper
- references the object in the outer most block. In the case of non-class types, this would be nonsense and unstable.The second parameter is optionally the error message to show when the constraint fails
Example:
Note that due to unions, its important to understand extactly where the
@constraint
attribute attaches.Note that the above who are different. in the first, the
@constraint
applies to the fieldFoo.bar
, but in the second,@constraint
applies to theint
. In practice, they would do the same thing, but this may cause turmoil later.Does it modify the prompt? No. If we want to support generic functions as constraints, we cannot intelligently modify the prompt. The user is expected to modify the prompt appropriately.
More complicated examples:
Referencing a field
We use
super
to refer to a different field in the class inside a @constraint.@@constraint
is a block-level attribute that applies to the class, and you can refer to all the properties of the class usingthis
The difference between these two is the error. The first raises an error with message
password_match
on classFoo
. The second raises an error onFoo.confirm_password
.Referencing a field in a different class:
@ref
is used to reference temporary data passed in from parent classes.Example: Here we define a Deparment class that contains employees. The employee class will validate that the salary follows the department’s salary cap.
The syntax looks like you’re instantiating a class “Employee”, since that’s what we’re used to seeing in other languages. However, it really means that the type
Employee
has 1 reference passed in:max_salary
— which comes fromDepartment
.References bubble up until they are passed in.
What does this mean? If a child class has a @ref (Employee), the parent class (Department) must instantiate that child with that ref. If it doesn’t, then it means Deparment implicitly also has that same ref (max_salary).
We will add a third top-level parent called “Corporation” that passes in the ref:
Passing references in functions
Sometimes we want to validate that the output matches a string in the input, like when asking an LLM to generate citations. Or you may just want to constrain the output type of a specific
function
in a certain way.You can also use inputs from function parameters
Note: we highlight functions and types differently in BAML (not in this doc)
When validations fail…
Your BAML function will raise a
BamlValidationError
similar to what happens when parsing fails.Validating fields without throwing exceptions
You can make any constraint error a warning that is available in your Python/TS code, by adding a third parameter:
This will have the following impact on your generated bindings:
The parameter
error_type="warning"
in@constraint
would automatically change the type ofquote string
toquote BamlValidated<string>
in Python / TS, whereBamlValidated
is the following:Another example (RAG with citations)
To summarize:
@constraint
is an attribute that can be attached to any type or class property (not enum value) that has a couple of parameters, block level@@constraint
@ref
can be added to a class as an invisible, baml-only property that doesn’t actually get shared anywhere but baml. it is purely a ref to some other value.@ref
bubbles up, but if theres ever a naming conflict, must be explicitly set@refs
must be resolvedMore complex examples
What does it look like with many validations? Is it readable? Can I understand it?
Employee / department “interdependent classes with reference passing”
Disambiguating ‘this’
One of the current issues with the syntax is that “this” refers to different things in different contexts.
“this” in a constraint, maps to value of a field itself.
using “this” when passing references to a type maps to the whole class
What if we support
this
inside a new hypotethical class property that’s computed at runtime using@compute
?this
is also referring toMyclass
We’re ok with the current approach — IDE / extension can help disambiguate what
this
means, and there are a limited number of contexts.Should we require block strings around evaluatable expressions?
Or just disambiguate them automatically as only available inside of
@constraint
? The only issue is that now in@alias
we don’t support this.The best solution may be:
unquoted strings
. They are only explicitly supported in certain locations.Dynamic Classes
.constraint
method eventually, but this is lower priorityQuestions
Is this slow?
How do we do regex matching?
matches
usage e.g.my_var|matches(#"^[\w]+$"#)
)/^[\w]+$/
is out of scope as we would need to add a new stringHow do i add multiple constraints
Can I validate UUIDs?
How could we support custom hooks to do something like a DB check?
Can users modify the data in the field?
Do we do retries or anything clever on schema failures?
BamlValidationError
What happens when the expression doesn’t resolve to a boolean?
When does the validator run?
Can users just add their own pydantic validations as well?
Can i do non-1-liners?
Can my one-liner be formatted across multiple lines?
Can you transform these validations into zod/pydantic validations?
Short-hand
Later we can introduce shorthand validators but they are just all syntactical sugar which makes implementing them much easier.
This is out-of-scope because we should implement
@constraint
first.Appendix A: What we learned from our citations RAG demo:
https://baml-examples.vercel.app/examples/rag
(and how this validation would work to make it easier)
What we added to help?
error_type="warning"
@ref
In the future we can add a custom filter
closest_substring_match
which does:Appendix B: Alternative @ref ideas
One option is to refer to other properties from other classes using some special syntax inside a constraint, instead of having to define @ref’s explicitly and passing them down.
Here’s all the issues:
$
option 2:
option 3
option 4
Appendix C: Integration with Playground / UX / additional tooling
Option 2: PKL-like syntax (rejected)
We likely can’t do this as BAML doesn’t support powerful expressions PKL-lang does as a top level ast-node. Once we add ast support for top level expression nodes, we’d be able to consider this, but as of now, its out of scope due to technical reasons. Additionally, we must support jinja as our primary expression resolver (
foo + bar
, as otherwise we’d have to write that from scratch).e.g.
Beta Was this translation helpful? Give feedback.
All reactions