Run make
from the root directory. It will generate latc
and latc_llvm
executables. They are the same binaries, but the project description specified both of them.
You can run the compiler with ./latc [[-ir] file]
or ./latc_llvm [[-ir] file]
. If you don't provide a file, it will read from stdin
until ctrl + d
. If you provide a filename, you can choose to generate only .bc
and .ll
files or use the -ir
flag to also generate the additional intermediate files. The intermediate files are
.tac
file that has the intermediate code, which is a modified LLVM allowing for copy (x = y
etc) and doesn't require strict SSA form..ai
a very simple file containing alive variables at the beginning and ending of each block.
Currently, the frontend and backend are not in sync. Frontend allows for virtual methods and inheritance, while backend does not, however I already have some extensions/optimisations implemented for the backend and if possible, would like to ask for feedback about them.
- arrays
- classes with virtual methods
- when checking if all paths in a function/method have a return statement, the TypeChecker considers while loops and if statements. It can evaluate "constant expressions" which are a subset of expressions containing only
int
orbool
literals. self
is optional. The default name resolution is:- local
- inheritance
- global Since there are no global variables or local functions, that means that methods are first searched in current class and superclasses and then outside, while variables are first searched in the local scope and then in the current class and superclasses.
- superclasses MUST be declared before subclasses
- usage of phi functions instead of alloca
- copy propagation
- constant folding & propagation
- unreachable block cleanup
- arrays
- structures
- code is adjusted for constexpressions in conditionals (aside from constant propagation). For example
if (true || false) {
return 1;
} else {
return 2;
}
return 3;
Will get compiled to ret i32 1
.
- the compiler does not generate blocks unreachable from
entry
block - string literals are global constants
- internally, the compiler mangles function names, so that all user-usable functions are prefixed with "_" to avoid user being able to try to redefine functions used internally, for example
malloc
. - arrays are simply structures with two fields. One contains a pointer to array elements, the second is the array's length.
- src
Latte.cf
-- grammarConstExpr.hs
-- logic for computing constant expressionsInstr.hs
-- definition of the intermediate code and most of the types used. In addition to that, provides instances ofShow
that are used when generating code.IRCompiler.hs
-- translates the program into intermediate representation (poor version without Phi nodes)Liveness.hs
-- performs flow analysis to compute alive variables at the beginning and end of each blockSsa.hs
-- cleans up the poor IR form. Does copy propagation and constant propagation/folding, transforms the IR to strict SSA form using Phi nodesTypeChecker.hs
-- contains the TypeChecker logicMain.hs
-- helper that parses the program, invokes the TypeChecker and the Compiler. It is used to create an executable file.
- lib
runtime.bc
with its sourceruntime.ll
orruntime_darwin.ll
providing implementations of latte predefined functions and utilities likemalloc
andaddStrings
. There are two source files which are mostly the same, but differ in the way standard input is defined.runtime.bc
will get built on everymake
execution.
- bnfc
- alex
- happy
- containers
- mtl
- process
- filepath
Functions toLLVMEscaped
and toLLVMString
in Instr.hs
were generated by chatgpt.
The SSA algorithm was mostly based on (especially the general idea and writeVal
and readVal
) "Simple and Efficient Construction of Static Single Assignment Form" [Braun, Buchwald, Hack, Leißa, Mallon, Zwinkau 2013] with some modifications -- for example, I do liveness analysis to avoid visiting predecessors for each unknown name.