Skip to content

Commit

Permalink
Add README
Browse files Browse the repository at this point in the history
  • Loading branch information
emesare committed Oct 31, 2024
1 parent 9f525e0 commit 2178a87
Showing 1 changed file with 126 additions and 0 deletions.
126 changes: 126 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# WARP

**WARP** provides a common format for transferring and applying function information across binary analysis tools.

## WARP Integrations

### Binary Ninja

WARP integration is available as an [open source](https://github.com/Vector35/binaryninja-api/tree/dev/plugins/warp) first-party plugin for [Binary Ninja] and as such ships by default.

## Function Identification

Function identification is the main way to interact with **WARP**, allowing tooling to utilize **WARP**'s dataset to identify
common functions within any binary efficiently and accurately.

### Integration Requirements

To integrate with **WARP** function matching you must be able to:

1. Disassemble instructions
2. Identify basic blocks that make up a function
3. Identify register groups with implicit extend operation
4. Identify relocatable instructions (see [What is considered a relocatable operand?](#what-is-considered-a-relocatable-operand))

### Creating a Function GUID

The function GUID is the UUIDv5 of the basic block GUID's (sorted highest to lowest start address) that make up the function.

#### Example

Given the following sorted basic blocks:

1. `036cccf0-8239-5b84-a811-60efc2d7eeb0`
2. `3ed5c023-658d-5511-9710-40814f31af50`
3. `8a076c92-0ba0-540d-b724-7fd5838da9df`

The function GUID will be `7a55be03-76b7-5cb5-bae9-4edcf47795ac`.

##### Example Code

```py
import uuid

def uuid5(namespace, name_bytes):
"""Generate a UUID from the SHA-1 hash of a namespace UUID and a name bytes."""
from hashlib import sha1
hash = sha1(namespace.bytes + name_bytes).digest()
return uuid.UUID(bytes=hash[:16], version=5)

function_namespace = uuid.UUID('0192a179-61ac-7cef-88ed-012296e9492f')
bb1 = uuid.UUID("036cccf0-8239-5b84-a811-60efc2d7eeb0")
bb2 = uuid.UUID("3ed5c023-658d-5511-9710-40814f31af50")
bb3 = uuid.UUID("8a076c92-0ba0-540d-b724-7fd5838da9df")
function = uuid5(function_namespace, bb1.bytes + bb2.bytes + bb3.bytes)
```

#### What is the UUIDv5 namespace?

The namespace for Function GUID's is `0192a179-61ac-7cef-88ed-012296e9492f`.

### Creating a Basic Block GUID

The basic block GUID is the UUIDv5 of the byte sequence of the instructions (sorted in execution order) with the following properties:

1. Zero out all instructions containing a relocatable operand.
2. Exclude all NOP instructions.
3. Exclude all instructions that set a register to itself if they are effectively NOPs.

#### When are instructions that set a register to itself removed?

To support hot-patching we must remove them as they can be injected by the compiler at the start of a function (see: [1] and [2]).
This does not affect the accuracy of the function GUID as they are only removed when the instruction is a NOP:

- Register groups with no implicit extension will be removed (see: [3] (under 3.4.1.1))

For the `x86_64` architecture this means `mov edi, edi` will _not_ be removed, but it _will_ be removed for the `x86` architecture.

#### What is considered a relocatable operand?

An operand that is used as a pointer to a mapped region.

For the `x86` architecture the instruction `e8b55b0100` (or `call 0x15bba`) would be zeroed.

#### What is the UUIDv5 namespace?

The namespace for Basic Block GUID's is `0192a178-7a5f-7936-8653-3cbaa7d6afe7`.

### Function Constraints

Function constraints allow us to further disambiguate between functions with the same GUID, when creating the functions we store information about the following:

- Called functions
- Caller functions
- Adjacent functions

Each entry in the lists above is referred to as a "constraint" that can be used to further reduce the number of matches for a given function GUID.

##### Why don't we require matching on constraints for trivial functions?

The decision to match on constraints is left to the user. While requiring constraint matching for functions
from all datasets can reduce false positives, it may not always be necessary. For example, when transferring functions
from one version of a binary to another version of the same binary, not matching on constraints for trivial functions
might be acceptable.

## Comparison of Function Recognition Tools

### WARP vs FLIRT

The main difference between **WARP** and **FLIRT** is the approach to identification.

#### Function Identification

- **WARP** the function identification is described [here](#function-identification).
- **FLIRT** uses incomplete function byte sequence with a mask where there is a single function entry (see: [IDA FLIRT Documentation] for a full description).

What this means in practice is **WARP** will have less false positives based solely off the initial function identification.
When the returned set of functions is greater than one, we can use the list of [Function Constraints](#function-constraints) to select the best possible match.
However, that comes at the cost of requiring a computed GUID to be created whenever the lookup is requested and that the function GUID is _**always**_ the same.


[1]: https://devblogs.microsoft.com/oldnewthing/20110921-00/?p=9583
[2]: https://devblogs.microsoft.com/oldnewthing/20221109-00/?p=107373
[3]: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf
[IDA FLIRT Documentation]: https://docs.hex-rays.com/user-guide/signatures/flirt/ida-f.l.i.r.t.-technology-in-depth
[Binary Ninja]: https://binary.ninja
[Binary Ninja Integration]: https://github.com/Vector35/binaryninja-api/tree/dev/plugins/warp

0 comments on commit 2178a87

Please sign in to comment.