|
| 1 | +# WARP |
| 2 | + |
| 3 | +**WARP** provides a common format for transferring and applying function information across binary analysis tools. |
| 4 | + |
| 5 | +## WARP Integrations |
| 6 | + |
| 7 | +### Binary Ninja |
| 8 | + |
| 9 | +WARP integration is available as an [open source](https://github.com/Vector35/binaryninja-api/tree/dev/plugins/warp) first-party plugin for [Binary Ninja] and as such ships by default. |
| 10 | + |
| 11 | +## Function Identification |
| 12 | + |
| 13 | +Function identification is the main way to interact with **WARP**, allowing tooling to utilize **WARP**'s dataset to identify |
| 14 | +common functions within any binary efficiently and accurately. |
| 15 | + |
| 16 | +### Integration Requirements |
| 17 | + |
| 18 | +To integrate with **WARP** function matching you must be able to: |
| 19 | + |
| 20 | +1. Disassemble instructions |
| 21 | +2. Identify basic blocks that make up a function |
| 22 | +3. Identify register groups with implicit extend operation |
| 23 | +4. Identify relocatable instructions (see [What is considered a relocatable operand?](#what-is-considered-a-relocatable-operand)) |
| 24 | + |
| 25 | +### Creating a Function GUID |
| 26 | + |
| 27 | +The function GUID is the UUIDv5 of the basic block GUID's (sorted highest to lowest start address) that make up the function. |
| 28 | + |
| 29 | +#### Example |
| 30 | + |
| 31 | +Given the following sorted basic blocks: |
| 32 | + |
| 33 | +1. `036cccf0-8239-5b84-a811-60efc2d7eeb0` |
| 34 | +2. `3ed5c023-658d-5511-9710-40814f31af50` |
| 35 | +3. `8a076c92-0ba0-540d-b724-7fd5838da9df` |
| 36 | + |
| 37 | +The function GUID will be `7a55be03-76b7-5cb5-bae9-4edcf47795ac`. |
| 38 | + |
| 39 | +##### Example Code |
| 40 | + |
| 41 | +```py |
| 42 | +import uuid |
| 43 | + |
| 44 | +def uuid5(namespace, name_bytes): |
| 45 | + """Generate a UUID from the SHA-1 hash of a namespace UUID and a name bytes.""" |
| 46 | + from hashlib import sha1 |
| 47 | + hash = sha1(namespace.bytes + name_bytes).digest() |
| 48 | + return uuid.UUID(bytes=hash[:16], version=5) |
| 49 | + |
| 50 | +function_namespace = uuid.UUID('0192a179-61ac-7cef-88ed-012296e9492f') |
| 51 | +bb1 = uuid.UUID("036cccf0-8239-5b84-a811-60efc2d7eeb0") |
| 52 | +bb2 = uuid.UUID("3ed5c023-658d-5511-9710-40814f31af50") |
| 53 | +bb3 = uuid.UUID("8a076c92-0ba0-540d-b724-7fd5838da9df") |
| 54 | +function = uuid5(function_namespace, bb1.bytes + bb2.bytes + bb3.bytes) |
| 55 | +``` |
| 56 | + |
| 57 | +#### What is the UUIDv5 namespace? |
| 58 | + |
| 59 | +The namespace for Function GUID's is `0192a179-61ac-7cef-88ed-012296e9492f`. |
| 60 | + |
| 61 | +### Creating a Basic Block GUID |
| 62 | + |
| 63 | +The basic block GUID is the UUIDv5 of the byte sequence of the instructions (sorted in execution order) with the following properties: |
| 64 | + |
| 65 | +1. Zero out all instructions containing a relocatable operand. |
| 66 | +2. Exclude all NOP instructions. |
| 67 | +3. Exclude all instructions that set a register to itself if they are effectively NOPs. |
| 68 | + |
| 69 | +#### When are instructions that set a register to itself removed? |
| 70 | + |
| 71 | +To support hot-patching we must remove them as they can be injected by the compiler at the start of a function (see: [1] and [2]). |
| 72 | +This does not affect the accuracy of the function GUID as they are only removed when the instruction is a NOP: |
| 73 | + |
| 74 | +- Register groups with no implicit extension will be removed (see: [3] (under 3.4.1.1)) |
| 75 | + |
| 76 | +For the `x86_64` architecture this means `mov edi, edi` will _not_ be removed, but it _will_ be removed for the `x86` architecture. |
| 77 | + |
| 78 | +#### What is considered a relocatable operand? |
| 79 | + |
| 80 | +An operand that is used as a pointer to a mapped region. |
| 81 | + |
| 82 | +For the `x86` architecture the instruction `e8b55b0100` (or `call 0x15bba`) would be zeroed. |
| 83 | + |
| 84 | +#### What is the UUIDv5 namespace? |
| 85 | + |
| 86 | +The namespace for Basic Block GUID's is `0192a178-7a5f-7936-8653-3cbaa7d6afe7`. |
| 87 | + |
| 88 | +### Function Constraints |
| 89 | + |
| 90 | +Function constraints allow us to further disambiguate between functions with the same GUID, when creating the functions we store information about the following: |
| 91 | + |
| 92 | +- Called functions |
| 93 | +- Caller functions |
| 94 | +- Adjacent functions |
| 95 | + |
| 96 | +Each entry in the lists above is referred to as a "constraint" that can be used to further reduce the number of matches for a given function GUID. |
| 97 | + |
| 98 | +##### Why don't we require matching on constraints for trivial functions? |
| 99 | + |
| 100 | +The decision to match on constraints is left to the user. While requiring constraint matching for functions |
| 101 | +from all datasets can reduce false positives, it may not always be necessary. For example, when transferring functions |
| 102 | +from one version of a binary to another version of the same binary, not matching on constraints for trivial functions |
| 103 | +might be acceptable. |
| 104 | + |
| 105 | +## Comparison of Function Recognition Tools |
| 106 | + |
| 107 | +### WARP vs FLIRT |
| 108 | + |
| 109 | +The main difference between **WARP** and **FLIRT** is the approach to identification. |
| 110 | + |
| 111 | +#### Function Identification |
| 112 | + |
| 113 | +- **WARP** the function identification is described [here](#function-identification). |
| 114 | +- **FLIRT** uses incomplete function byte sequence with a mask where there is a single function entry (see: [IDA FLIRT Documentation] for a full description). |
| 115 | + |
| 116 | +What this means in practice is **WARP** will have less false positives based solely off the initial function identification. |
| 117 | +When the returned set of functions is greater than one, we can use the list of [Function Constraints](#function-constraints) to select the best possible match. |
| 118 | +However, that comes at the cost of requiring a computed GUID to be created whenever the lookup is requested and that the function GUID is _**always**_ the same. |
| 119 | + |
| 120 | + |
| 121 | +[1]: https://devblogs.microsoft.com/oldnewthing/20110921-00/?p=9583 |
| 122 | +[2]: https://devblogs.microsoft.com/oldnewthing/20221109-00/?p=107373 |
| 123 | +[3]: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf |
| 124 | +[IDA FLIRT Documentation]: https://docs.hex-rays.com/user-guide/signatures/flirt/ida-f.l.i.r.t.-technology-in-depth |
| 125 | +[Binary Ninja]: https://binary.ninja |
| 126 | +[Binary Ninja Integration]: https://github.com/Vector35/binaryninja-api/tree/dev/plugins/warp |
0 commit comments