This is the first step of my journey around LLVM compiler infrastructure.
In order to get familiar with LLVM IR instructions, I firstly employ clang
to see how it compiles a simple C program to LLVM IR.
For example,
int main() {
char c;
int i;
long l;
c = 'a';
i = 72;
l = 123456789012345;
}
this simple C program is compiled to the following LLVM IR instructions.
define i32 @main() {
%1 = alloca i8, align 1
%2 = alloca i32, align 4
%3 = alloca i64, align 8
store i8 97, i8* %1, align 1
store i32 72, i32* %2, align 4
store i64 123456789012345, i64* %3, align 8
ret i32 0
}
Try clang -S -emit-llvm -O0 test.c
to see this, ignore attributes or some comments.
Now you learn define
, alloca
, store
and ret
instructions in one minute.
It's very easy isn't it?
The clang
compiler shows us what LLVM IR instruction corresponds to C codes.
This is the very first step for learning LLVM compiler infrastructure in my opinion.
The next step is generating LLVM IR instructions from programs. Are you going to try C++ API? Are you trying to start the Kaleidoscope tutorial? Please wait they are still difficult and complicated for beginners. Yeah, if you can learn forward, go ahead. But too difficult for me.
Brainf**k is like Hello world for learners of compiler and interpreter. I like this language. It has simple syntax that we can write its interpreter in an hour, but famous enough that complicated codes are found online (like mandelbrot.b written by Erik Bosman). We can pick up such Brainf**k codes to check whether our interpreter works correctly or not. Writing codes for 8 instructions then a mandelbrot art is generated on terminal, it's like a miracle!
I firstly wrote bf2llvm.c.
Of course, I used clang -S -emit-llvm
to see which instructions to output for each Brainf**k instructions.
As you can see, it directly outputs LLVM IR instructions.
It has no dependency on LLVM library and easy to what instructions will be generated by the code.
But it's difficult to maintain temporary variable index, it can go into wrong codes easily and it requires the lli command to run the output.
Secondly I wrote bf2llvm.cpp.
I explored the document generated by doxygen to see which functions to generate the IR instructions I need.
It's very fun to use the LLVM C++ API!
I need not take care for the index of temporary variables, precise type annotations for many instructions.
The getOrInsertFunction
function works like a charm that I don't care for declarations outside the main function.
Object file generation is still difficult for me but I learned a lot from the chapter 8 of the Kaleidoscope tutorial.
In my experience of writing the two Brainf**k compilers, now I think I got familiar with LLVM IR instructions than before.
After all, clang -S -emit-llvm
is a great teacher for me, rather than complicated doxygen document and the Kaleidoscope tutorial.
But now I'm going to try the Kaleidoscope tutorial (again; I gave up a few months ago).
Thank you LLVM infrastructure and it's contributors, you are great! I'm still a beginner of compiler techniques but I know you gave me a way to generate executables for my own language. I'm excited to learn LLVM infrastructure, want to contribute language compilers based on LLVM and create my own language in the future!