Skip to content

An LLVM compiler for a made up Java-like imperative language

Notifications You must be signed in to change notification settings

AaRaBiNoZa/latte-compiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Usage

Run make from the root directory. It will generate latc and latc_llvm executables. They are the same binaries, but the project description specified both of them.

You can run the compiler with ./latc [[-ir] file] or ./latc_llvm [[-ir] file]. If you don't provide a file, it will read from stdin until ctrl + d. If you provide a filename, you can choose to generate only .bc and .ll files or use the -ir flag to also generate the additional intermediate files. The intermediate files are

  • .tac file that has the intermediate code, which is a modified LLVM allowing for copy (x = y etc) and doesn't require strict SSA form.
  • .ai a very simple file containing alive variables at the beginning and ending of each block.

Important

Currently, the frontend and backend are not in sync. Frontend allows for virtual methods and inheritance, while backend does not, however I already have some extensions/optimisations implemented for the backend and if possible, would like to ask for feedback about them.

Extensions - Frontend

  • arrays
  • classes with virtual methods

Notes - Frontend

  • when checking if all paths in a function/method have a return statement, the TypeChecker considers while loops and if statements. It can evaluate "constant expressions" which are a subset of expressions containing only int or bool literals.
  • self is optional. The default name resolution is:
    • local
    • inheritance
    • global Since there are no global variables or local functions, that means that methods are first searched in current class and superclasses and then outside, while variables are first searched in the local scope and then in the current class and superclasses.
  • superclasses MUST be declared before subclasses

Extensions - Backend

  • usage of phi functions instead of alloca
  • copy propagation
  • constant folding & propagation
  • unreachable block cleanup
  • arrays
  • structures

Notes - Backend

  • code is adjusted for constexpressions in conditionals (aside from constant propagation). For example
if (true || false) {
  return 1;
} else {
  return 2;
}
return 3;

Will get compiled to ret i32 1.

  • the compiler does not generate blocks unreachable from entry block
  • string literals are global constants
  • internally, the compiler mangles function names, so that all user-usable functions are prefixed with "_" to avoid user being able to try to redefine functions used internally, for example malloc.
  • arrays are simply structures with two fields. One contains a pointer to array elements, the second is the array's length.

Directories

  • src
    • Latte.cf -- grammar
    • ConstExpr.hs -- logic for computing constant expressions
    • Instr.hs -- definition of the intermediate code and most of the types used. In addition to that, provides instances of Show that are used when generating code.
    • IRCompiler.hs -- translates the program into intermediate representation (poor version without Phi nodes)
    • Liveness.hs -- performs flow analysis to compute alive variables at the beginning and end of each block
    • Ssa.hs -- cleans up the poor IR form. Does copy propagation and constant propagation/folding, transforms the IR to strict SSA form using Phi nodes
    • TypeChecker.hs -- contains the TypeChecker logic
    • Main.hs -- helper that parses the program, invokes the TypeChecker and the Compiler. It is used to create an executable file.
  • lib
    • runtime.bc with its source runtime.ll or runtime_darwin.ll providing implementations of latte predefined functions and utilities like malloc and addStrings. There are two source files which are mostly the same, but differ in the way standard input is defined. runtime.bc will get built on every make execution.

Libraries

  • bnfc
  • alex
  • happy
  • containers
  • mtl
  • process
  • filepath

Borrowed code

Functions toLLVMEscaped and toLLVMString in Instr.hs were generated by chatgpt.

The SSA algorithm was mostly based on (especially the general idea and writeVal and readVal) "Simple and Efficient Construction of Static Single Assignment Form" [Braun, Buchwald, Hack, Leißa, Mallon, Zwinkau 2013] with some modifications -- for example, I do liveness analysis to avoid visiting predecessors for each unknown name.

About

An LLVM compiler for a made up Java-like imperative language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published