|
| 1 | +# RzIL |
| 2 | + |
| 3 | +RzIL is the new intermediate language in Rizin, primarily intended for representing the semantics of machine code. It is designed as a clone of BAP's [Core Theory](http://binaryanalysisplatform.github.io/bap/api/master/bap-core-theory/Bap_core_theory/), with minor deviations where necessary; it is worth noting that in practice, RzIL is very similar to the SMT representation with bitvectors and bitvector-indexed arrays as well as effects. |
| 4 | + |
| 5 | +More details related to the implementation can be found [here](https://github.com/rizinorg/rizin/blob/dev/doc/rzil.md). |
| 6 | + |
| 7 | +## IL statements |
| 8 | + |
| 9 | +Instructions can are internally represented as [IL statements](https://github.com/rizinorg/rizin/blob/dev/librz/include/rz_il/rz_il_opcodes.h); these statements are expressed as **s-expressions** (symbolic expression) which utilize LISP-like syntax as string format. |
| 10 | + |
| 11 | +- `aoi` generates a **one-line** LISP-like syntax (JSON format is available via `aoj` command). |
| 12 | +- `aoip` generates a **prettified** LISP-like syntax |
| 13 | + |
| 14 | +Here an example using `aoip` (the *prettified* output) of two PowerPC instructions (`stwu` and `mflr`). |
| 15 | + |
| 16 | +```bash |
| 17 | +[0x10000488]> pd 2 |
| 18 | +| 0x10000488 stwu r1, -0x10(r1) |
| 19 | +| 0x1000048c mflr r0 |
| 20 | +[0x10000488]> aoi? |
| 21 | +Usage: aoi[p] # Print the RzIL of next N instructions |
| 22 | +| aoi [<n_instructions>] # Print the RzIL of next N instructions |
| 23 | +| aoip [<n_instructions>] # Pretty print the RzIL of next N instructions |
| 24 | +[0x10000488]> aoip 2 |
| 25 | +0x10000488 |
| 26 | +(seq |
| 27 | + (storew 0 |
| 28 | + (+ |
| 29 | + (var r1) |
| 30 | + (let v |
| 31 | + (bv 16 0xfff0) |
| 32 | + (ite |
| 33 | + (msb |
| 34 | + (var v)) |
| 35 | + (cast 32 |
| 36 | + (msb |
| 37 | + (var v)) |
| 38 | + (var v)) |
| 39 | + (cast 32 |
| 40 | + false |
| 41 | + (var v))))) |
| 42 | + (cast 32 |
| 43 | + false |
| 44 | + (var r1))) |
| 45 | + (set r1 |
| 46 | + (+ |
| 47 | + (var r1) |
| 48 | + (let v |
| 49 | + (bv 16 0xfff0) |
| 50 | + (ite |
| 51 | + (msb |
| 52 | + (var v)) |
| 53 | + (cast 32 |
| 54 | + (msb |
| 55 | + (var v)) |
| 56 | + (var v)) |
| 57 | + (cast 32 |
| 58 | + false |
| 59 | + (var v))))))) |
| 60 | +0x1000048c |
| 61 | +(set r0 |
| 62 | + (cast 32 |
| 63 | + false |
| 64 | + (var lr))) |
| 65 | +``` |
| 66 | + |
| 67 | +The `plf` command allows to generate an in-line representation of the entire function in s-expressions. |
| 68 | + |
| 69 | +```bash |
| 70 | +[0x100002bc]> pdf @ sym.example |
| 71 | +/ sym.example(); |
| 72 | +| ; var int32_t var_1ch @ stack - 0x1c |
| 73 | +| 0x1000044c stwu r1, -0x10(r1) |
| 74 | +| 0x10000450 mflr r0 |
| 75 | +| 0x10000454 stw r0, 0x14(r1) |
| 76 | +| 0x10000458 lwz r0, 0x14(r1) |
| 77 | +| 0x1000045c addi r1, r1, 0x10 |
| 78 | +| 0x10000460 mtlr r0 |
| 79 | +\ 0x10000464 blr |
| 80 | +[0x100002bc]> plf @ sym.example |
| 81 | +0x1000044c (seq (storew 0 (+ (var r1) (let v (bv 16 0xfff0) (ite (msb (var v)) (cast 32 (msb (var v)) (var v)) (cast 32 false (var v))))) (cast 32 false (var r1))) (set r1 (+ (var r1) (let v (bv 16 0xfff0) (ite (msb (var v)) (cast 32 (msb (var v)) (var v)) (cast 32 false (var v))))))) |
| 82 | +0x10000450 (set r0 (cast 32 false (var lr))) |
| 83 | +0x10000454 (seq (storew 0 (+ (var r1) (let v (bv 16 0x14) (ite (msb (var v)) (cast 32 (msb (var v)) (var v)) (cast 32 false (var v))))) (cast 32 false (var r0))) empty) |
| 84 | +0x10000458 (seq (set r0 (let ea (+ (var r1) (let v (bv 16 0x14) (ite (msb (var v)) (cast 32 (msb (var v)) (var v)) (cast 32 false (var v))))) (let loadw (loadw 0 32 (var ea)) (cast 32 false (var loadw))))) empty) |
| 85 | +0x1000045c (seq (set a (var r1)) (set b (let v (bv 16 0x10) (ite (msb (var v)) (cast 32 (msb (var v)) (var v)) (cast 32 false (var v))))) empty (set r1 (+ (var a) (var b))) empty empty empty) |
| 86 | +0x10000460 (set lr (cast 32 false (var r0))) |
| 87 | +0x10000464 (seq (set CIA (bv 32 0x10000464)) empty empty (set NIA (& (bv 32 0xfffffffc) (var lr))) (jmp (var NIA))) |
| 88 | +[0x100002bc]> |
| 89 | +``` |
| 90 | + |
| 91 | +The same output of `aoi` can be obtained via `rz-asm` like this: |
| 92 | + |
| 93 | +```bash |
| 94 | +$ rz-asm -de -a ppc 7c0802a6 |
| 95 | +mflr r0 |
| 96 | +$ rz-asm -Ie -a ppc 7c0802a6 |
| 97 | +(set r0 (cast 32 false (var lr))) |
| 98 | +``` |
| 99 | + |
| 100 | +## Emulation |
| 101 | + |
| 102 | +Rizin enables instruction emulation by leveraging RzIL. The emulation can be used to record changes within the VM, like read and writes of registers and memory locations (`e io.buffers=true` is required for memory ops). The emulation is controlled via the `aez` commands. |
| 103 | + |
| 104 | +```bash |
| 105 | +[0x00000000]> aez? |
| 106 | +Usage: aez<isv?> # RzIL Emulation |
| 107 | +| aezi # Initialize the RzIL Virtual Machine at the current offset |
| 108 | +| aezs [<n_times>] # Step N instructions within the RzIL Virtual Machine |
| 109 | +| aezse[j] [<n_times>] # Step N instructions within the RzIL VM and output VM changes (read & |
| 110 | + write) |
| 111 | +| aezsu <address> # Step until PC equals given address |
| 112 | +| aezsue <address> # Step until PC equals given address and output VM changes (read & write) |
| 113 | +| aezv[jqt] [<var_name> [<number>]] # Print or modify the current status of the RzIL Virtual Machine |
| 114 | +``` |
| 115 | +
|
| 116 | +Supported architectures can be inspected via the `La` command. If the architecture has an `I`, as in the example below, it supports RzIL. |
| 117 | +
|
| 118 | +```bash |
| 119 | +_dAeI 32 64 ppc BSD Capstone PowerPC disassembler |
| 120 | +``` |
| 121 | +
|
| 122 | +Here is an example of emulation of a PowerPC binary printing a string via *printf*. |
| 123 | +
|
| 124 | +In this example, `r9` contains the base address which is used to calculate the pointer to the string (stored in `r3`) used by 'reloc.printf'. |
| 125 | +
|
| 126 | +```bash |
| 127 | +[0x1000049c]> pd 3 |
| 128 | +| 0x1000049c lis r9, 0x1000 |
| 129 | +| 0x100004a0 addi r3, r9, 0x640 |
| 130 | +| 0x100004a4 bl reloc.printf |
| 131 | +``` |
| 132 | +
|
| 133 | +First we need to initialize the RzIL Virtual Machine at the current offset using `aezi` |
| 134 | +
|
| 135 | +```bash |
| 136 | +[0x1000049c]> aezi? |
| 137 | +Usage: aezi # Initialize the RzIL Virtual Machine at the current offset |
| 138 | +[0x1000049c]> aezi |
| 139 | +``` |
| 140 | +
|
| 141 | +Then we execute 2 instructions via `aezs` (quiet) or use `aezse` to see the actual changes within the RzIL VM. |
| 142 | +
|
| 143 | +```bash |
| 144 | +[0x1000049c]> aezse? |
| 145 | +Usage: aezse[j] [<n_times>] # Step N instructions within the RzIL VM and output VM changes (read & write) |
| 146 | +[0x1000049c]> aezse 2 # execute 2 instructions |
| 147 | +pc_write(old: 0x1000049c, new: 0x100004a0) |
| 148 | +var_write(name: r9, old: 0x0, new: 0x10000000) |
| 149 | +pc_write(old: 0x100004a0, new: 0x100004a4) |
| 150 | +var_write(name: r3, old: 0x0, new: 0x10000640) |
| 151 | +``` |
| 152 | +
|
| 153 | +It's possible to see (or modify) the values of the registers in the RzIL VM via `aezv`. |
| 154 | +
|
| 155 | +```bash |
| 156 | +[0x1000049c]> # We can also print the content of the RzIL VM via 'aezv' |
| 157 | +[0x1000049c]> aezv? |
| 158 | +Usage: aezv[jqt] [<var_name> [<number>]] # Print or modify the current status of the RzIL Virtual Machine |
| 159 | +[0x1000049c]> aezv r3 |
| 160 | + r3: 0x10000640 |
| 161 | +[0x1000049c]> aezv r9 |
| 162 | + r9: 0x10000000 |
| 163 | +``` |
| 164 | +
|
| 165 | +Now that we know that the string is situated at `0x10000640`, we can print it. |
| 166 | +
|
| 167 | +```bash |
| 168 | +[0x1000049c]> # hexdump the content of address 0x10000640 with a buffer size of 0x20 bytes. |
| 169 | +[0x1000049c]> px @ 0x10000640 @! 0x20 |
| 170 | +- offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF |
| 171 | +0x10000640 5369 6d70 6c65 2050 5043 2070 726f 6772 Simple PPC progr |
| 172 | +0x10000650 616d 2e00 0000 0000 ffff ffff ffff ffff am.............. |
| 173 | +[0x1000049c]> |
| 174 | +[0x1000049c]> # Decode and print the utf-8 string at address 0x10000640 |
| 175 | +[0x1000049c]> ps @ 0x10000640 |
| 176 | +Simple PPC program. |
| 177 | +[0x1000049c]> |
| 178 | +``` |
0 commit comments