The global architecture and designs for this entire project is presented here.
Explanations for our simulator can be found here.
Explanations for the assembly part is here.
Our microprocessor is build using the Minijazz netlist format (more info about its insides [here]
The clock is initialized putting current unix time in ram separated in seconds, minutes, hours, day, month and year. Then, using a double dabble algorithm, we place the 14 values (yyyy/dd/mm hh:mm:ss) inside 14 regiters from x0 to x13. We also put in ram a value updated each time a second has passed. The main algorithm works as follow, everytime this value is set to 1, we update the global second counter and do a serie of check to update the 14 registers. To print the values, we use a 7-segment logic built in the simulator.
The clock takes some time to initialized since it's doing a double dabble for the years, months, days, hours, minutes and seconds which takes approximately 600 cycles.
We’ve chosen to make our project using the 32 bits RISC-’s architecture. We’ll use 32 registers.
We’ll use a total of 28 instructions to make operations between unsigned integers. We’ll encode our instructions in different ways according to the instruction’s format (operation with immediates I, with jumps U, only on registers R). The different operations we’ll use are:
-
add, sub (type R) and addi (type I). Our arithmetical instructions.
-
sll, srl, sra Of type R and their equivalent of type I to make bitwise shifts.
-
logical operations and, or and xor of type R and their equivalent of type I.
-
tests operations slt and slti.
-
lui and auipc are special instructions of type U, used to move in the stack.
-
beq, bne, blt, blti (type I) and bge for conditional branching. Except for blti, they’re of type R.
-
jal and jalr of type U for unconditional jumps.
-
lw (type I) and sw (type R) load and store datas from and to memory.
Here are the different encoding for the different instruction formats. These formats help us manipulate immediates of big size. It’s probably not mandatory but it can come handy in some particular cases.
rd represents the destination register and rsX the sources ones.
31 20 | 19 15 | 14 10 | 9 5 | 4 0 | |
---|---|---|---|---|---|
R |
immédiat |
rs2 |
rs1 |
rd |
opcode |
I |
immédiat |
rs |
rd |
opcode |
|
U |
immédiat |
rd |
opcode |
Our instructions will have the following opcodes:
000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 | ||
---|---|---|---|---|---|---|---|---|---|
00 | Addi | Srli | Srai | Slli | Andi | Ori | Xori | ||
01 | Add | Sub | Srl | Sra | Sll | And | Or | Xor | |
10 | Lw | Sw | Beq | Bne | Blt | Blti | Bge | ||
11 | Lui | Auipc | Jal | Jalr | Slt | Slti |
instruction | format | opcode | usage | result |
---|---|---|---|---|
Addi | I | 00000 | addi x0 x1 1 | x0 = x1 + 1 |
Srli | I | 00010 | srli x0 x1 1 | x0 = x1 >> 1 (0 extended) |
Srai | I | 00011 | srai x0 x1 1 | x0 = x1 >> 1 (sign extended) |
Slli | I | 00100 | slli x0 x1 1 | x0 = x1 << 1 |
Andi | I | 00101 | andi x0 x1 1 | x0 = x1 & 1 |
Ori | I | 00110 | ori x0 x1 1 | x0 = x1 | 1 |
Xori | I | 00111 | xori x0 x1 1 | x0 = x1 ⊕ 1 |
Add | R | 01000 | add x0 x1 x2 1 | x0 = x1 + x2 |
Sub | R | 01001 | sub x0 x1 x2 1 | x0 = x1 − x2 |
Srl | R | 01010 | srl x0 x1 x2 1 | x0 = x1 >> x2 (0 extended) |
Sra | R | 01011 | sra x0 x1 x2 1 | x0 = x1 >> x2 (sign extended) |
Sll | R | 01100 | sll x0 x1 x2 1 | x0 = x1 << x2 |
And | R | 01101 | and x0 x1 x2 1 | x0 = x1 & x2 |
Or | R | 01110 | or x0 x1 x2 1 | x0 = x1 | x2 |
Xor | R | 01111 | xor x0 x1 x2 1 | x0 = x1 ⊕ x2 |
Lw | I | 10000 | lw x0 x1 1 | x0 = Ram(x1) |
Sw | R | 10001 | sw x0 x1 x2 1 | Ram(x2) = x1 |
Beq | R | 10011 | beq x0 x1 x2 1 | rdi = (x1 == x2) ? rdi + 1 : rdi + 4 |
Bne | R | 10100 | bne x0 x1 x2 1 | rdi = (x1 != x2) ? rdi + 1 : rdi + 4 |
Blt | R | 10101 | blt x0 x1 x2 1 | rdi = (x1 < x2) ? rdi + 1 : rdi + 4 |
Bge | R | 10111 | bge x0 x1 x2 1 | rdi = (x1 ≥ x2) ? rdi + 1 : rdi + 4 |
Jal | U | 11010 | jal x0 1 | rdi = 1 |
Slt | R | 11110 | slt x0 x1 x2 3 | x0 = (x1 < x2) ? 1 : 0 |
Slti | I | 11111 | slti x0 x1 3 | x0 = (x1 < 3) ? 1 : 0 |
(You can check the github repository)
To see how to run and try the simulator, chek the HOWTO.md file
Behavior for global variables and registers:
- The simulator is interpreting every equations from the netlist at each steps (or infinitely if no number of steps was specified) using environments to spread infos between loops.
- The environments are tables mapping identifiers to their current value (a Vbit or a VBitArray).
- We're using two different environments, one current and one representing the environment state in the last step. Doing so we can manage the registers' one step delay by reading the register value from the previous environment.
- To interpret each equations we devide it into its identifier and its expression.
- We then interpret the so call expression using the
calculExp
function which, given an expression, call the corresponding function that interprets that kind of expression and update the given environment.
Behavior for memories:
- We've chosen to give a fixed
address size
of 16 bits for both the RAM's and the ROM's addresses. - We've chosen to give a fixed
word size
of 32 bits for both the RAM's and the ROM's values. - Like the two others environments, memories will be repesent by tables mapping addresses (represented as strings) to their correesponding value.
- At the begining of the simulator, we create an empty ROM and an empty RAM. By 'empty' we mean that it contains 2^(
address size
) keys, all having an empty VBitArray of sizeword size
. - Since the ROM is a read only memory, we only need one environment to represent it.
- For the RAM, we're using the same trick as for the registers which mean having a second environment representing the state of the RAM during the last step. When reading from the RAM we're reading the value corresponding to the address in the previous environment while when we're writting to it, we're doing it in the current environment.
To represent the clock we use 14 7-segments:
- R :
instr rd rs1 rs2
- R :
instr rd rs1 rs2 imm
- R :
instr rd rs1 rs2 lab
- I :
instr rd rs imm
- I :
instr rd rs lab
- U :
instr rd imm
- U :
instr rd lab
Comments starting by
#
Labels are strings of chars and digits followed by a:
R types with labels are the conditionnal branching, just need to append the label (or immediate) to the instruction.
So the syntax is first the register output, then the two registers or register immediate or immediate/label.
Run make compiler
to create an executable named asm
. It runs then on files with name of the format file.x
(because its cool 😎).
A tests
directory is provided but we need more automatization. For the moment, the assembler takes care of immediates on 16 bits in two's complement. It's not optimal we might want to extend the range of immediates with different types of instructions.
The instructions are coded in little endian and the format is the one from our report so on the string in reading order we have opcode then output register, immediate and input registers.
The design of our microprocessor is similar to the one we've seen in class.
We've divided it into 6 major blocks:
-
The rdi: which is used to calculate the address of the next instruction in ROM.
-
The rom: which is a read only memory that contains the program we want to run.
-
The di: which get all the informations encoded in the instructions comming from the ROM.
-
The reg: which represents our registers logic.
-
The alu: which is our main arithmetic and logic unit.
-
The ram: which is our mutable memory.
Module to get the address in rom of the next instruction to read.
If the current instruction in not a branchment, we set our first selector (instr_is_jmp) to 0.
Otherwise, we have two situations:
- Inconditional jump (jal)
-> The ALU return 1 (extended to 32 bits)
- Conditional jump (beq, bne, ...)
-> The ALU return 1 if the condition is valid, 0 if not
For both cases we start by putting the selector instr_is_jmp to 1. Then, if the ALU returned value is 1, we set another selector (jmp_cond_fullfiled) to 1.
We then get two possible addresses for the next instruction to read:
- add1 = last address + 4 (if we do not want to jump)
- add2 = immediate extended to 32 bits + old address, immediate comming from the branching instruction which represents the label where we want to jump (i.e. the offset from the current address to the one we want to jump to)
Afterward, we use two multiplexor:
add3 = jmp_cond_fullfiled as the selector, add1 as the first value, add2 as the second value
add4 = instr_is_jmp as the selector, add1 as the first value, add3 as the second value
finally we return add4.
It takes as input an address next_instr
from rdi and returns the concatenate value of the four values inside the ROM at this next_instr
, next_instr + 1
, next_instr + 2
, next_instr + 3
.
Addresses on 16 bits (so the netlist can be simulated in a decent amount of time).
List of registers: Variable("x0", REG_SIZE), Variable("x1", REG_SIZE), ..., Variable("x1", REG_SIZE)
To read a value inside a register given an address r
, we get the values for all 32 registers and then use
a multiplexor (as in the ALU) using r
as the selector to get only the value we're looking for.
To write a value inside a register given an address rd
, we actually replace the values in all the registers.
To choose the new value to put inside a register i
, we use two multiplexors, one using rd
as a selector
and then another one using wenable
as a selector so we only modify the value for the register which's address matches rd
and only if wenable
is set to true. The value we're actually writting is choosed
between the value from the ALU and the value from the RAM using
the instruction opcode as the multiplexor selector.
Lasty, reg
return 2 variable, i1
(the value read inside the register of address rs1
) and i2
. i2
can get two possible values, if the instruction is of type I
then i2
is the immediate given as a parameter for the get
function, otherwise, i2
is the value read inside the register of address rs2
.
This is the main Arithmetic and Logic unit.
It takes as input two variables (in 32 bits), and the instruction opcode to return the wanted value.
To do so, it calculates in parallel all the different possible outcomes for all the possible instructions using the two variables inputs. Then, with a multiplexor it chooses the wanted outcome using the opcode as the selector.
The ALU can perform arithmetical operations, shift operations, logic operations, ram manipulations and branching. For branching, if the condition is respected then the output would be a 1 extended to 32 bits otherwise it will be a 0 extended as well.
We have three diferent shifts, logical left shift (multiplies by 2), logcial right shift (only 0 for replacing, divides by 2) and arithmetical right shift (divides by 2 conserving the sign).
Uses only 3 inputs, the first input is used as the read address or a write data depending on the instruction, the second decides if we should write in the RAM or not and the last one is the write address.
All addresses are on 16 bits (so the netlist can be simulated in a decent amount of time).
The decoder takes as inputs an instruction comming from the ROM and interprets it to produced 6 values:
- an
opcode
, which represents the opcode encoded in the instruction. - an
imm
, which represents the immediate encoded in the instruction. - a
rs1
, which represents the first register address encoded in the instruction (for formatU
, this value will not be taken into account later in the netlist). - a
rs2
, which represents the second register address encoded in the instruction (for formatsU
andI
, this value will not be taken into account later in the netlist). - a
rd
, which represents the register address of the destination encoded in the instruction. - a
wenableReg
, which is a single bit telling us if we should write the last result in a register. - and a
wenableRAM
, which is a single bit telling us if we should write the last result in the RAM.
The folder tests
contains some tests for the microprocessor different blocks.
Example of a main to put inside proc_netlist that uses a file from the tests directory.
# inside proc_netlist/main.py
import os
import sys
sys.path.insert(0, os.getcwd())
# example for test_alu
from tests import test_alu
def main() -> None:
test_alu.test_alu()
The folder global_utils
is a python module containing utility functions and global constants.
To build everything you can either enter each subfolders and build anything particular part of the project you want or you can use the command make
which will build the assembler, the simulator and the netlist corresponding to the microprocessor.
To run a program written in our assembly language, you can do the following:
# ./run.sh <file.x>
#
# This example will run the program ./assembler/tests/test_ram.x infinitely
./run.sh ./assembler/tests/test_ram.x
Or, if you want the programm to run only for a given amount of step, you can add an option at the end of the command:
# ./run.sh -n <nb_step> <file.x>
#
# This example will run the program ./assembler/tests/test_ram.x during 11 simulator steps
./run.sh -n 11 ./assembler/tests/test_ram.x
You can also add an option to choose if you want to print the registers at each step or to print a 7-segment formated output to represent a clock (using registers from x0 to x13).
# ./run.sh -n <nb_step> --clk <file.x>
#
# This example will run the program ./assembler/tests/test_7seg.x during 11 simulator steps and output it as a clock
./run.sh -n 11 --clk ./assembler/tests/test_7seg.x
To run the digital clock in real time you can do:
./run.sh --clk ./assembler/tests/clk_real_time.x
For the fast-forward mode, you can do:
./run.sh --clk ./assembler/tests/clk_goes_vroom.x
Notice that the clock takes some time to initialize because of the double dabble algorithm.