Skip to content

Latest commit

 

History

History
102 lines (90 loc) · 2.72 KB

README.md

File metadata and controls

102 lines (90 loc) · 2.72 KB

Rajax: Regex Bytecode Compiler

This module turns regular expressions into bytecode for execution on the virtual machine invented by Ken Thompson and described by Russ Cox. It also produces a Graphviz representation of the abstract syntax tree (with some nodes removed).

Some of the VM instructions (NCHAR, WILD) are not documented by Cox. NCHAR means "match anything but this" and WILD is the wildcard.

Usage

Usage: Compile a regular expression into NFA instructions

Options:
  -h, --help            show this help message and exit
  -d DOT, --dot=DOT     Write the AST as a Graphviz dot file
  -f FORMAT, --format=FORMAT
                        Output format, either "pretty" or "json"
  -j, --json            Alias for --format=json
  -p PDF, --pdf=PDF     Write the AST as a PDF file. Implies
                        --dot=FILENAME.dot.
  -v, --verbose         Print debugging information

Install

pip install -e git+https://github.com/cwru-compilers/regex_compiler.git#egg=rajax

Example

rajax ab[\D8]+

Tokens:
'ORD_CHAR' 'a'
'ORD_CHAR' 'b'
'LBRACK' '['
'ES_SPECIAL' 'D'
'ORD_CHAR' '8'
'RBRACK' ']'
'PLUS' '+'
-------
Graphviz written to AST.dot
PDF written to AST.pdf
-------
Instructions for VM:
 0: char    a ZERO
 1: char    b ZERO
 2: split   3   5
 3: split   4   7
 4: jmp     9
 5: char    : INF
 6: jmp    10
 7: char  ZERO   /
 8: jmp    10
 9: char    8 ZERO
10: split   2  11
11: match

AST.dot will look like this:

digraph AST {
    node [shape=box];
    {
        rank=same; 
        1 [label="re_expr_concat: ", style=filled, fillcolor="#eeeeee"];
    }
    {
        rank=same; 
        2 [label="re_expr_concat: ", style=filled, fillcolor="#eeeeee"];
        5 [label="simple_re_dup: +", style=filled, fillcolor="#ccffcc"];
    }
    {
        rank=same; 
        3 [label="nondup_re_char: a", style=filled, fillcolor="#ffffcc"];
        4 [label="nondup_re_char: b", style=filled, fillcolor="#ffffcc"];
        6 [label="brack_expr_matching_list: ", style=filled, fillcolor="#ffffff"];
    }
    {
        rank=same; 
        7 [label="follow_list_multi: ", style=filled, fillcolor="#cccccc"];
    }
    {
        rank=same; 
        8 [label="char_class: D", style=filled, fillcolor="#ffcccc"];
        9 [label="end_range_char: 8", style=filled, fillcolor="#ccccff"];
    }
    1 -> 2;
    1 -> 5;
    2 -> 3;
    2 -> 4;
    5 -> 6;
    6 -> 7;
    7 -> 8;
    7 -> 9;
}