Skip to content

Commit a776897

Browse files
committed
Enhance repetition combinators:
- Redesigned repeat operations with new directives: `repeat<min,max>`, `at_least<min>`, `at_most<max>`, and `exactly<count>`, improving expressiveness and performance. - Updated documentation and tests to reflect new features and ensure comprehensive coverage of the changes.
1 parent 6761f7f commit a776897

File tree

7 files changed

+621
-171
lines changed

7 files changed

+621
-171
lines changed

CHANGELOG.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,14 @@
22

33
## Release v0.6.0 (Under Development)
44

5-
## Release v0.5.0 (March 18th, 2025)
5+
* Completely redesigned repeat mechanism with a highly optimized implementation that dramatically reduces bytecode size and significantly improves parsing performance for repetitive patterns. Introduced more expressive control directives: `repeat<min,max>[e]` for bounded repetition, `at_least<min>[e]` for minimum repetition, `at_most<max>[e]` for maximum repetition, and `exactly<count>[e]` for fixed repetition.
6+
* Added specialized repeat opcodes that eliminate redundant stack operations and reduce instruction count for common repetition patterns.
7+
* Implemented tail-call optimization for repeat operations to prevent stack overflow on deeply nested repetitions.
8+
* Implemented an optimized whitespace skipping mechanism, replacing the previous implementation for better performance.
9+
* Fixed critical issues with `eol` and `eoi` combinators that were incorrectly interacting with whitespace skipping logic.
10+
* Added specialized fast paths for common whitespace patterns to improve parsing speed in typical scenarios.
11+
12+
## Release v0.5.0 (March 18, 2025)
613

714
* Implemented collection and object attribute directives. The new `collect<C>[e]` directive synthesizes a sequence or associative container type `C` consisting of elements gathered from the inherited or synthesized attributes in expression `e`. Likewise, there also new `synthesize<C,A...>[e]`, `synthesize_shared<C,A...>[e]` and `synthesize_unique<C,A...>[e]` directives for synthesizing objects, shared pointers and unique pointers respectively, constructed from the component attributes in expression `e`.
815
* Implemented `synthesize_collect` directive that combines `collect` and `synthesize` directives together for improved code readability and reduced boilerplate when constructing complex data structures from parsed elements. This is particularly useful for building nested collections like arrays of objects or maps with complex value types.

README.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ lug
66
[![Tidy](https://github.com/jwtowner/lug/actions/workflows/tidy.yml/badge.svg)](https://github.com/jwtowner/lug/actions/workflows/tidy.yml)
77
[![License](https://img.shields.io/packagist/l/doctrine/orm.svg)](https://github.com/jwtowner/lug/blob/master/LICENSE.md)
88
===
9-
A C++ embedded domain specific language for expressing parsers as extended [parsing expression grammars (PEGs)](https://en.wikipedia.org/wiki/Parsing_expression_grammar)
9+
A C++ embedded domain specific language for [parsing expression grammars (PEGs)](https://en.wikipedia.org/wiki/Parsing_expression_grammar)
1010

1111
![lug](https://github.com/jwtowner/lug/raw/master/doc/lug_logo_large.png)
1212

@@ -15,7 +15,7 @@ Features
1515
- Natural syntax resembling external parser generator languages, with support for attributes and semantic actions.
1616
- Ability to handle context-sensitive grammars with symbol tables, conditions and syntactic predicates.
1717
- Generated parsers are compiled to special-purpose bytecode and executed in a virtual parsing machine.
18-
- Clear separation of syntactic and lexical rules, with the ability to customize implicit whitespace skipping.
18+
- Implicit whitespace skipping with clear separation of syntactic and lexical rules.
1919
- Support for direct and indirect left recursion, with precedence levels to disambiguate subexpressions with mixed left/right recursion.
2020
- Full support for UTF-8 text parsing, including Level 1 and partial Level 2 compliance with the UTS #18 Unicode Regular Expressions technical standard.
2121
- Error handling and recovery with labeled failures, recovery rules and error handlers.
@@ -161,8 +161,10 @@ Quick Reference
161161
| `skip⁠[e]` | Turns on all whitespace skipping for subexpression *e* (the default). |
162162
| `noskip⁠[e]` | Turns off all whitespace skipping for subexpression *e*, including preceeding whitespace. |
163163
| `lexeme⁠[e]` | Treats subexpression *e* as a lexical token with no internal whitespace skipping. |
164-
| `repeat(N)⁠[e]` | Matches exactly *N* occurences of expression *e*. |
165-
| `repeat(N,M)⁠[e]` | Matches at least *N* and at most *M* occurences of expression *e*. |
164+
| `repeat<N,M>⁠[e]` | Matches at least *N* and at most *M* occurences of expression *e*. |
165+
| `at_least<N>⁠[e]` | Matches at least *N* occurences of expression *e*. |
166+
| `at_most<N>⁠[e]` | Matches at most *N* occurences of expression *e*. |
167+
| `exactly<N>⁠[e]` | Matches exactly *N* occurences of expression *e*. |
166168
| `on(C)⁠[e]` | Sets the condition *C* to true for the scope of subexpression *e*. |
167169
| `off(C)⁠[e]` | Sets the condition *C* to false for the scope of subexpression *e* (the default). |
168170
| `symbol(S)⁠[e]` | Pushes a symbol definition for symbol *S* with value equal to the captured input matching subexpression *e*. |
@@ -172,6 +174,7 @@ Quick Reference
172174
| `collect<C>⁠[e]` | Synthesizes a collection attribute of container type *C* from the attributes inherited from or synthesized within expression *e*. |
173175
| `collect<C,A...>⁠[e]` | Synthesizes a collection attribute of container type *C* consisting of elements, each of which are constructed from sequences of attributes inherited from or synthesized within expression *e* and that match the types of parameter pack *A...*. |
174176
| `synthesize<T,A...>⁠[e]` | Synthesizes an object of type *T* constructed from a sequence of attributes inherited from or synthesized within expression *e* and that match the types of parameter pack *A...*. |
177+
| `synthesize_collect<T,C,A...>⁠[e]` | Synthesizes an object of type *T* constructed from a container of type *C* composed of the attributes inherited from or synthesized within expression *e* and that match the types of parameter pack *A...*. |
175178
| `synthesize_shared<T>⁠[e]` | Synthesizes an object of type `std::shared_ptr<T>` by calling `std::make_shared` passing in an attribute of type *T* inherited from or synthesized within expression *e*. |
176179
| `synthesize_shared<T,A...>⁠[e]` | Synthesizes an object of type `std::shared_ptr<T>` by calling `std::make_shared` passing in a sequence of attributes inherited from or synthesized within expression *e* and that match the types of parameter pack *A...*. |
177180
| `synthesize_unique<T>⁠[e]` | Synthesizes an object of type `std::unique_ptr<T>` by calling `std::make_unique` passing in an attribute of type *T* inherited from or synthesized within expression *e*. |
@@ -189,10 +192,9 @@ Quick Reference
189192

190193
| Terminal | Description |
191194
| --- | --- |
192-
| `nop` | No operation, does not emit any instructions. |
193-
| `eps` | Matches the empty string. |
194195
| `eoi` | Matches the end of the input sequence. |
195-
| `eol` | Matches a Unicode line-ending. |
196+
| `eol` | Matches a line-ending. |
197+
| `eps` | Matches the empty string. Equivalent to no-operation. |
196198
| `cut` | Emits a cut operation, accepting semantic actions up to current match prefix unless there were syntax errors, and draining the input source. |
197199
| `accept` | Accepts all semantic actions up to current match prefix, even after recovering from syntax errors. Does not drain the input source. |
198200
| `raise⁠(f)` | Raises the labeled failure *f* to be handled by the top level error handler and recovery rule. |

0 commit comments

Comments
 (0)