Skip to content

Commit fde48c8

Browse files
Robin PicardRobinPicard
Robin Picard
authored andcommitted
Update the documentation of the regex dsl
1 parent 3ca64b3 commit fde48c8

File tree

1 file changed

+71
-43
lines changed

1 file changed

+71
-43
lines changed

docs/reference/regex_dsl.md

Lines changed: 71 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -39,51 +39,55 @@ print(to_regex(digit)) # Output: [0-9]+
3939

4040
---
4141

42-
## Early Introduction to Quantifiers & Operators
42+
## Early Introduction to Quantifiers & Combining Terms
4343

4444
The DSL supports common regex quantifiers as methods on every `Term`. These methods allow you to specify how many times a pattern should be matched. They include:
4545

46-
- **`times(count)`**: Matches the term exactly `count` times.
46+
- **`exactly(count)`**: Matches the term exactly `count` times.
4747
- **`optional()`**: Matches the term zero or one time.
4848
- **`one_or_more()`**: Matches the term one or more times (Kleene Plus).
4949
- **`zero_or_more()`**: Matches the term zero or more times (Kleene Star).
50-
- **`repeat(min_count, max_count)`**: Matches the term between `min_count` and `max_count` times (or open-ended if one value is omitted).
50+
- **`between(min_count, max_count)`**: Matches the term between `min_count` and `max_count` times (inclusive).
51+
- **`at_least(count)`**: Matches the term at least `count` times.
52+
- **`at_most(count)`**: Matches the term up to `count` times.
5153

52-
Let’s see these quantifiers side by side with examples.
54+
These quantifiers can also be used as functions that take the `Term` as an argument. If the term is a plain string, it will be automatically converted to a `String` object. Thus `String("foo").optional()` is equivalent to `optional("foo")`.
55+
56+
Let's see these quantifiers side by side with examples.
5357

5458
### Quantifiers in Action
5559

56-
#### `times(count)`
60+
#### `exactly(count)`
5761

5862
This method restricts the term to appear exactly `count` times.
5963

6064
```python
6165
# Example: exactly 5 digits
62-
five_digits = Regex(r"\d").times(5)
66+
five_digits = Regex(r"\d").exactly(5)
6367
print(to_regex(five_digits)) # Output: (\d){5}
6468
```
6569

66-
You can also use the `times` function:
70+
You can also use the `exactly` function:
6771

6872
```python
69-
from outlines.types import times
73+
from outlines.types import exactly
7074

7175
# Example: exactly 5 digits
72-
five_digits = times(Regex(r"\d"), 5)
76+
five_digits = exactly(Regex(r"\d"), 5)
7377
print(to_regex(five_digits)) # Output: (\d){5}
7478
```
7579

7680
#### `optional()`
7781

78-
The `optional()` method makes a term optional, meaning it may occur zero or one time.
82+
This method makes a term optional, meaning it may occur zero or one time.
7983

8084
```python
8185
# Example: an optional "s" at the end of a word
8286
maybe_s = String("s").optional()
8387
print(to_regex(maybe_s)) # Output: (s)?
8488
```
8589

86-
You can also use the `optional` function, the string will automatically be converted to a `String` object:
90+
You can also use the `optional` function:
8791

8892
```python
8993
from outlines.types import optional
@@ -116,15 +120,15 @@ print(to_regex(letters)) # Output: ([A-Za-z])+
116120

117121
#### `zero_or_more()`
118122

119-
This method means that the term can occur zero or more times.
123+
This method indicates that the term can occur zero or more times.
120124

121125
```python
122126
# Example: zero or more spaces
123127
spaces = String(" ").zero_or_more()
124128
print(to_regex(spaces)) # Output: ( )*
125129
```
126130

127-
You can also use the `zero_or_more` function, the string will automatically be converted to a `String` instance:
131+
You can also use the `zero_or_more` function:
128132

129133
```python
130134
from outlines.types import zero_or_more
@@ -134,40 +138,64 @@ spaces = zero_or_more(" ")
134138
print(to_regex(spaces)) # Output: ( )*
135139
```
136140

137-
#### `repeat(min_count, max_count)`
141+
#### `between(min_count, max_count)`
138142

139-
The `repeat` method provides flexibility to set a lower and/or upper bound on the number of occurrences.
143+
This method indicates that the term can appear any number of times between `min_count` and `max_count` (inclusive).
140144

141145
```python
142146
# Example: Between 2 and 4 word characters
143-
word_chars = Regex(r"\w").repeat(2, 4)
147+
word_chars = Regex(r"\w").between(2, 4)
144148
print(to_regex(word_chars)) # Output: (\w){2,4}
145-
146-
# Example: At least 3 digits (min specified, max left open)
147-
at_least_three = Regex(r"\d").repeat(3, None)
148-
print(to_regex(at_least_three)) # Output: (\d){3,}
149-
150-
# Example: Up to 2 punctuation marks (max specified, min omitted)
151-
up_to_two = Regex(r"[,.]").repeat(None, 2)
152-
print(to_regex(up_to_two)) # Output: ([,.]){,2}
153149
```
154150

155-
You can also use the `repeat` function:
151+
You can also use the `between` function:
156152

157153
```python
158-
from outlines import repeat
154+
from outlines.types import between
159155

160156
# Example: Between 2 and 4 word characters
161-
word_chars = repeat(Regex(r"\w"), 2, 4)
157+
word_chars = between(Regex(r"\w"), 2, 4)
162158
print(to_regex(word_chars)) # Output: (\w){2,4}
159+
```
160+
161+
#### `at_least(count)`
163162

164-
# Example: At least 3 digits (min specified, max left open)
165-
at_least_three = repeat(Regex(r"\d"), 3, None)
163+
This method indicates that the term must appear at least `count` times.
164+
165+
```python
166+
# Example: At least 3 digits
167+
at_least_three = Regex(r"\d").at_least(3)
166168
print(to_regex(at_least_three)) # Output: (\d){3,}
169+
```
170+
171+
You can also use the `at_least` function:
172+
173+
```python
174+
from outlines.types import at_least
175+
176+
# Example: At least 3 digits
177+
at_least_three = at_least(Regex(r"\d"), 3)
178+
print(to_regex(at_least_three)) # Output: (\d){3,}
179+
```
180+
181+
#### `at_most(count)`
182+
183+
This method indicates that the term can appear at most `count` times.
184+
185+
```python
186+
# Example: At most 3 digits
187+
up_to_three = Regex(r"\d").at_most(3)
188+
print(to_regex(up_to_three)) # Output: (\d){0,3}
189+
```
190+
191+
You can also use the `at_most` function:
192+
193+
```python
194+
from outlines.types import at_most
167195

168-
# Example: Up to 2 punctuation marks (max specified, min omitted)
169-
up_to_two = repeat(Regex(r"[,.]"), None, 2)
170-
print(to_regex(up_to_two)) # Output: ([,.]){,2}
196+
# Example: At most 3 digits
197+
up_to_three = at_most(Regex(r"\d"), 3)
198+
print(to_regex(up_to_three)) # Output: (\d){0,3}
171199
```
172200

173201
---
@@ -186,17 +214,17 @@ pattern = String("hello") + " " + Regex(r"\w+")
186214
print(to_regex(pattern)) # Output: hello\ (\w+)
187215
```
188216

189-
### Alternation (`|`)
217+
### Alternation (`either()`)
190218

191-
The `|` operator creates alternatives, allowing a match for one of several patterns.
219+
The `either()` function creates alternatives, allowing a match for one of several patterns. You can provide as many terms as you want.
192220

193221
```python
194-
# Example: Match either "cat" or "dog"
195-
animal = String("cat") | "dog"
196-
print(to_regex(animal)) # Output: (cat|dog)
222+
# Example: Match either "cat" or "dog" or "mouse"
223+
animal = either(String("cat"), "dog", "mouse")
224+
print(to_regex(animal)) # Output: (cat|dog|mouse)
197225
```
198226

199-
*Note:* When using operators with plain strings (such as `"dog"`), the DSL automatically wraps them in a `String` object and escapes the characters that have a special meaning in regular expressions.
227+
*Note:* When using `either()` with plain strings (such as `"dog"`), the DSL automatically wraps them in a `String` object that escapes the characters that have a special meaning in regular expressions, just like with quantifier functions.
200228

201229
---
202230

@@ -223,7 +251,7 @@ For instance you can describe the answers in the GSM8K dataset using the followi
223251
```python
224252
from outlines.types import sentence, digit
225253

226-
answer = "A: " + sentence.repeat(2,4) + " So the answer is: " + digit.repeat(1,4)
254+
answer = "A: " + sentence.between(2,4) + " So the answer is: " + digit.between(1,4)
227255
```
228256

229257
---
@@ -237,7 +265,7 @@ Suppose you want to create a regex that matches an ID format like "ID-12345", wh
237265
- Followed by exactly 5 digits.
238266

239267
```python
240-
id_pattern = "ID-" + Regex(r"\d").times(5)
268+
id_pattern = "ID-" + Regex(r"\d").exactly(5)
241269
print(to_regex(id_pattern)) # Output: ID-(\d){5}
242270
```
243271

@@ -273,9 +301,9 @@ When used in a Pydantic model, the email field is automatically validated agains
273301
Consider a pattern to match a simple date format: `YYYY-MM-DD`.
274302

275303
```python
276-
year = Regex(r"\d").times(4) # Four digits for the year
277-
month = Regex(r"\d").times(2) # Two digits for the month
278-
day = Regex(r"\d").times(2) # Two digits for the day
304+
year = Regex(r"\d").exactly(4) # Four digits for the year
305+
month = Regex(r"\d").exactly(2) # Two digits for the month
306+
day = Regex(r"\d").exactly(2) # Two digits for the day
279307

280308
# Combine with literal hyphens
281309
date_pattern = year + "-" + month + "-" + day

0 commit comments

Comments
 (0)