Skip to content

Tutorial: Extending Proteus

marton bognar edited this page Sep 10, 2024 · 1 revision

This page presents a simple extension to the processor in the form of an assignment. Completing this assignment can be a good way of getting familiar with the processor and its ecosystem.

For this project, you will be implementing a processor extension to protect against return address smashing attacks. The idea is simple: whenever a function call is made, the return address is encrypted. Before returning from a function, the return address is decrypted again.

The extension will be implemented in two steps. First, the encryption and decryption functionality is added while using a hard-coded key. Then, a new instruction should be implemented to dynamically set the key from software. But before creating this return address smashing defence, you are asked to create a proof-of-concept attack.

Exploiting a buffer overflow on RISC-V

First, create a file smash.c with the following contents:

#include <stdio.h>
#include <stdint.h>

#include "interrupts.h"

void smash_this() {
    char buf[128] = {0};
    gets(buf);

    printf("stdin: ");

    for (size_t i = 0; buf[i] != 0; ++i)
        printf("%02x ", buf[i]);

    printf("\n");
}

int main() {
    enable_interrupts();
    smash_this();
}

This file contains a function that reads a string from the standard input and then prints the hexadecimal ASCII values of all characters to the standard output. Since gets stores all input from the standard input until the next newline into the destination buffer, this function is clearly vulnerable to a buffer overflow attack.

Task 1: Craft a payload that, when provided as the standard input of the exploitable program, prints the string "p0wned!" to the standard output.

Some hints (which you do not have to follow):

  • Embed the string in your payload and call puts with a pointer to this string;
  • Run the program in the simulator and use GTKWave to find the address of the stack pointer when entering the smash_this function. This will give you the address of buf;
  • Certain addresses (e.g., puts) need to be hard-coded;
  • Make sure the only 0x0a (newline) byte in your payload is at the very end.

Running the simulation

You can use the files in the newlib directory as a template for running the simulation (make smash.bin in this directory if smash.c was saved here). The standard input of the simulator will be attached to the standard input of the program running in the simulator. Therefore, the program can be run as follows (which might take a second or two):

$ echo 'Hello' | ../sim/build/sim smash.bin
stdin: 48 65 6c 6c 6f

If you implement your payload in shellcode.s, you can compile to it to a binary file that you can then supply as the imput to the simulation.

First, create a link script shellcode.ld:

OUTPUT_FORMAT("elf32-littleriscv", "elf32-littleriscv", "elf32-littleriscv")
OUTPUT_ARCH(riscv)
SECTIONS
{
  /* Read-only sections, merged into text segment: */
  PROVIDE (__executable_start = 0x0);
  . = 0x0;
  .text           :
  {
    *(.text)
  }
}

Then, you can compile your payload as follows:

make shellcode.o
riscv32-unknown-elf-gcc -ffreestanding -nostdlib -T shellcode.ld -o shellcode.elf shellcode.o
make shellcode.bin

Encrypting and decrypting return addresses

To be able to encrypt and decrypt return addresses, we have to identify when function calls and returns are made. Unfortunately, RISC-V does not define dedicated instructions for calls and returns and instead uses unconditional jumps in both cases. This means we need a heuristic to distinguish calls and return from normal unconditional jumps.

All unconditional jump instructions in RISC-V store the return address in the destination register. The RISC-V calling convention specifies the use of the x1 (ra) register for the return address of function calls. Therefore, unconditional jump instructions that store their return address in x1 can be considered function calls. Try to find a similar heuristic for identifying function returns (hint: look at the assembly generated by GCC).

For the jumps that are identified as calls, their return address should be encrypted before being stored in their destination register. Similarly, for returns, the target address should be decrypted before jumping to it. The goal of this project is not to implement cryptographic primitives, so a simple xor or similar operation suffices.

Task 2: Add logic to the BranchUnit plugin for encrypting and decrypting return addresses. Use a simple cryptographic primitive, such as a xor operation. The key to be used for encrypting and decrypting values should be hard-coded.

In principle, the only code that needs to be changed is that of the BranchUnit plugin in src/main/scala/riscv/plugins/BranchUnit.scala.

Setting the key dynamically

To be able to set the key used for encrypting return addresses from software, we need to add a new instruction. The RISC-V unprivileged ISA specification dedicates a full chapter (Chapter 35) to extending RISC-V. For our current purposes, however, Table 70 contains all necessary information. Every cell in this table marked with "custom-i" may be used to encode custom instructions.

The new instruction needs the 32-bit key as input and has no output value. This means the R-type encoding can be used while ignoring the rs2 and rd fields (which means setting them to fixed values).

The default value for the key should be zero and this should be interpreted as disabling the encryption and decryption of return addresses.

Task 3: Implement a new instruction that reads a key from rs1 and makes sure that key is used for any subsequent encryptions and decryptions of return addresses. When the key is set to zero, the return address protection feature is disabled.

As with the previous task, the implementation of the new instruction can be done by modifying the BranchUnit plugin.