This is a short guide to getting started with Lean 4 by writing simple programs. Lean 4 is a new language inspired by its predecessor, Lean 3. It is a dependently typed, pure functional programming language that is strictly evaluated. There are significant differences though and Lean 4 programs can be compiled down to C.
As with Lean 3, the best way to install Lean 4 is to use elan
:
$ TODO: install elan
elan
is a tool like ghcup
or rustup
. It installs a toolchain:
a bundle of programs for working with Lean projects that work
together.
However you install elan
, once it is on your path you can install
Lean 4:
$ elan toolchain install leanprover/lean4:nightly
If this is successful you should be able to run:
$ lean --version
Lean (version 4.0.0-nightly-2021-11-12, commit 781a28b8f4ba, Release)
Which displays the version of Lean and the git commit it was built from.
You will also need a toolchain for compiling C programs. In Ubuntu for example this would be:
$ sudo apt install build-essential
For MacOS you might use homebrew
or Chocolatey on Windows. For
those who prefer it there is a nix-based development environment that
sets everything up.
Lean shines as an interactive environment. It presently supports two editors: Emacs and VS Code. However, unlike Lean 3, it now speaks the Language Server Protocol which makes adding support to your editor much easier than before. If you don't use one of the supported editors it's worth using one or the other to try Lean out.
As is customary we will begin with printing, Hello, world! to the console. I use this as a quick check to make sure the toolchain is set up properly and everything works as expected. Start by creating a new package:
$ lake new HelloWorld && cd HelloWorld
This will create a new directory with a new git
repository
initialized, a package configuration file called lakefile.lean
,
toolchain file, a Main.lean
, and HelloWorld.lean
.
The lakefile.lean
is the package configuration file written in an
embedded domain specific language defined by
Lake. It
lets you specify the usual paths and build targets.
The toolchain file tells elan
which toolchain to use to build the
project.
Take a look at HelloWorld.lean
:
def hello := "world"
It contains a single definition. Now open up Main.lean
:
import HelloWorld
def main : IO Unit :=
IO.println s!"Hello, {hello}!"
If you've ever seen a Haskell tutorial this should look familiar.
Every Lean program that will be compiled to a native binary requires
an entry-point function named, main
. This function has to have the
type: IO Unit
.
If you have used Lean 3 you might notice that this is rather
different. We no longer need to import system.io
or "open" it.
Also we have IO
and Unit
with caps instead of io unit
. We even
have this nice substitution string, s!"World, {hello}!"
instead of
concatenating strings. And finally we have IO.println
instead of
put_str
. There's enough here to show that these are very different
languages.
To compile and run this program head to your shell and run:
$ lake build
After some output, if everything was successful, you should be able to run our first Lean 4 program:
$ ./build/bin/HelloWorld
Hello, world!
If something didn't work the best place to reach out is probably on
the Zulip chat server. The Lean community is very active there and if
someone hasn't already encountered your problem in the lean4
stream
there is probably someone who can help.
The next step is to modify our HelloWorld
example so that we can
prompt the user for some input and print out a message to them
containing it. Open up HelloWorld.lean
and add:
def hello2 : IO String := do
IO.println "What is your name? "
let stdin ← IO.getStdin
let name ← IO.FS.Stream.getLine stdin
pure s!"Hello, {name.trim}!"
Then modify Main.lean
:
def main : IO Unit := do
IO.println s!"Hello, {hello}!"
let greeting ← hello2
IO.println $ greeting
Compile with:
$ lake build
And run it again:
$ ./build/bin/HelloWorld
Hello, world!
What is your name?
Random J. Hacker
Hello, Random J. Hacker!
Excellent. We can get input from the user.
Take a peek inside the build/
directory and you will find:
bin/
ir/
lib/
bin/
contains the fully compiled native binary produced by your
platform's C tool-chain (for the platforms supported by Lean this
should be one of Clang or GCC) for the specified target(s) in your
package.
lib/
contains the *.olean
files. These are binary files used to
cache parsed *.lean
files with extra metadata about the definitions,
modules, etc.
ir/
this is probably the more interesting part if you like to peek
into the guts of what is being output by the Lean compiler: C code.
If you take a peek at a few of the .c
files you will see calls to
functions such as lean_inc
and lean_dec
. These come from Lean's
run-time library and are an artifact of how Lean does garbage
collection: reference counting. You will see these interspersed with
the various functions that map to our hello
and hello2
functions.
The scheme used here is from the Counting Immutable Beans paper by the language's inventor and his collaborators and is well worth a read.
Bonus: check out the lean4
source and poke around in
src/include/lean4/lean4.h
and src/runtime/io.cpp
.
Let's jump into a more substantial project and see what Lean 4 can do.
We're going to see if we can implement a small Unix utility: wc
.
This little program reads a stream of input and counts words,
characters, and lines.
A word is contiguous series of non-whitespace characters.
A line is a contiguous series of non-carriage return characters.
We'll limit ourselves to the ASCII character set to keep things simple.
First things first, create a new directory and initiate a project with Lake:
$ mkdir WordCount && cd WordCount && lake init
The first thing we will need is a data structure to maintain the state of our program as we read the input stream. It will keep the count of characters, words, and lines as well as a variable to track when we're in a word:
structure WordCount where
wordCount : Nat
lineCount : Nat
charCount : Nat
inWord : Bool
This is the way to define a product type with named fields in Lean,
also known as a structure as you can see from the structure
keyword. Each field definition contains a name on the left of the :
and a type after.
When Lean sees a structure
it will create a new type, namespace
(more on them later), and a few functions in that namespace. Try
adding the following to your source file:
#check WordCount.mk
#check WordCount.wordCount
If you're using an editor with an interactive Lean mode/plugin this
should display the type of those functions for you. #check
is a
command we can use to interact with Lean and ask it what the type of
some expression is. It acts like a comment as these lines will be
ignored when compiling your program. There are a handful of others
that are very helpful to learn which we will introduce as we go
along.
The first function there, WordCount.mk
is a convenient function for
constructing values of the type, WordCount
:
def emptyWordCount := WordCount.mk 0 1 0 false
Which would be fine for our case. However if you prefer to name the fields in your code you can also write it using anonymous structure syntax:
def emptyWordCount : WordCount :=
{ wordCount := 0
, lineCount := 1
, charCount := 0
, inWord := false
}
We introduce a new keyword here, def
, which introduces new
definitions. These can be functions, types, or values. In the
first example we give a definition without specifying a type: Lean can
infer the type of this definition from the constructor we use in the
expression.
In the second example using the anonymous structure syntax, Lean
doesn't know what the type of our definition is. We provide one in an
annotation on the definition after the :
.
Our plan of attack for this program is to update the state as we visit each character in the stream. That means we need a function that takes an accumulator state and one byte of input to compute the next state. Let's first write the definition of this function:
def countChar (wc : WordCount) (c : UInt8) : WordCount :=
sorry
We use sorry
here like undefined
. It's technically an error to
leave it in and our program won't check if we do. We're just telling
Lean, "sorry, I don't know how to define this right now." We can
still check other parts of the program in the meantime.
This is the common way to define pure functions in Lean. Most simple
functions you write will probably look something like this. Each
parameter is contained in a pair of parenthesis and the final :
allows us to annotate the type of this definition... in the case of a
function, the "return type" if you will. You can group like-typed
parameters in a single parenthesis like (x y : Int)
as needed.
If you are familiar with Haskell or ML-style type annotations this is equivalent to:
countChar :: WordCount -> UInt8 -> WordCount
If you are coming from C++ or Java:
WordCount countChar(WordCount wc, uint8_t char)
Let's replace that sorry
with a term that does what we want. Let's
start by increasing the character count:
def countChar (wc : WordCount) (c : UInt8) : WordCount :=
let wc := { wc with charCount := wc.charCount + 1 }
sorry
Here we have an example of a let
expression which binds a value to a
name for this expression. In this case we're rebinding wc
! The
parameter wc
doesn't change however. It's still immutable. The
reasons why this still works has to do with how Lean can elaborate
expressions. Just know that we're not mutating the original parameter
wc
and that if we rename our let-bound wc
to wc'
we can still
access the parameter wc
in the rest of the expression.
The next part of this is using the structure update syntax. Here
we're saying that we want a structure with the same values as wc
except we want the charCount
field to be this incremented value
from the prior state.
Moving on, let's add the state transition from being, "in a word" to being, "out of a word":
def countChar (wc : WordCount) (c : UInt8) : WordCount :=
let wc := { wc with charCount := wc.charCount + 1 }
if c == 32
then { wc with inWord := false }
else sorry
We have conditionals in Lean as you would expect. The if
here is an
expression and is in the body of the let
expression. In the case
where the character we're interrogating is an ASCII space then we
can transition out of a word as our next state. Otherwise...
def countChar (wc : WordCount) (c : UInt8) : WordCount :=
let wc := { wc with charCount := wc.charCount + 1 }
if (c == 32 || c == 13)
then { wc with inWord := false }
else if c == 10
then { wc with
lineCount := wc.lineCount + 1
inWord := false }
else sorry
We ask the character if it is an ASCII newline. If it is we return the next state as you would expect. And lastly we need to add the transition into a word:
def countChar (wc : WordCount) (c : UInt8) : WordCount :=
let wc := { wc with charCount := wc.charCount + 1 }
if (c == 32 || c == 13)
then { wc with inWord := false }
else if c == 10
then { wc with
lineCount := wc.lineCount + 1
inWord := false }
else if wc.inWord == true
then wc
else { wc with wordCount := wc.wordCount + 1,
inWord := true }
So to recap: we can let
bind names into a new scope. We can create
new structures from existing ones with new values for the fields we
care about using the structure update syntax, and we have if
expressions for interrogating boolean values.
The next thing we need to do is read our stream of bytes. We've
already seen the IO
type in our, "Hello, world!" program. It is
within the body of a function that returns this type in which our
programs can use other IO
returning functions. And the main entry
point to every Lean program is such a function:
def main : IO Unit := sorry
Anything where you want to read foreign memory, write to a stream
(such as using IO.println
), fork a thread, or read bytes from a
stream: they all have to be done in an IO
-returning function.
Lean 4 is evolving and as of this writing there is a spartan library
of IO
functions for reading and writing streams, creating processes,
sampling random generators, and more. Our program will need to read
from a stream and fold our countChar
function over that stream
from our initial emptyState
that we defined earlier. To do this we
define a helper function:
partial def IOfoldl {α} (f : α → UInt8 → α) (x : α) : IO α :=
sorry
Here we see the partial
keyword. By default Lean only allows total
functions. That means that we can only define functions where all
inputs map to an output and that also means that the function will
always terminate. This is useful for a lot of reasons but in programs
we cannot always make this guarantee. Lean 4 allows us to say, "this
function is partial, but trust me I will handle it."
We also have this funny-looking, {α}
thing. You can think of this
as a type variable. Lean 4 is a dependently-typed language and
there's more to it than this but for now it is a variable that stands
in for some type in the rest of our function definition. When we
call this function we will pass in a type for {α}
which will
determine what type it is. This is basically how we define
polymorphic functions in Lean 4.
You can write α
in Emacs using the key-sequence \a
and pressing
Enter
. It is similar in VS Code as well. Lean is not shy about
using unicode characters in a Lean source file. However if you do not
like them most have ASCII equivalents as we will see.
Next we have:
(f : α → UInt8 → α)
This defines a function parameter, f
whose type is: α → UInt8 → α
.
You can type the →
using \r
and pressing Enter
. If you're using
another editor other than the officially supported ones then you're on
your own.
We use this function parameter in IOfoldl
as our fold function.
If you look at the type of countChar
but replace the α
's with
WordCount
you will see it has the same shape and thus will be a
valid fit here.
See if you can figure out what the rest of the function signature for
IOfoldl
means.
Here is the complete definition:
partial def IOfoldl {α} (f : α → UInt8 → α) (x : α) : IO α := do
let stdin ← IO.getStdin
let stop ← stdin.isEof
if !stop
then
let cs ← stdin.read 4096
let x' := ByteArray.foldl f x cs
IOfoldl f x'
else
return x
Since this is a function returning an IO
value we can use other IO
functions in its body. We also see do
notation here for the first
time. It is similar to Haskell's do
notation. It allows us to
interleave IO
actions together where later IO
functions can depend
on the values of prior IO
functions. The first such function we
see:
let stdin ← IO.getStdin
The ←
here (typed \l
then Enter
or <-
in ASCII) binds the
result of IO.getStdin
to the let-bound name, stdin
. We need to
use the left-arrow to do this binding when we want to get the result
of an IO
function. If we use =
we will be binding the IO α
and
not the α
as we want in this case.
The type of the value bound to stdin
is IO.FS.Stream
. This is a
structure with a few fields containing useful functions for working
with file streams. You can find it's definition in the Lean 4 source
tree under: src/Init/System/IO.lean
:
structure FS.Stream where
isEof : IO Bool
flush : IO Unit
read : USize → IO ByteArray
write : ByteArray → IO Unit
getLine : IO String
putStr : String → IO Unit
We bind the isEof
result to stop
in our function so that we know
when we've reached the end of the stream and can stop processing any
more characters. This is basically why this function has to be
partial
: Lean doesn't know anything about the input stream: how long
it is, etc. So we cannot know if this function will ever terminate.
Fortunately that won't stop us from writing useful programs in Lean 4.
If there is more stream to process we use the read
function to read
in 4096 bytes from the stream at a time. We then determine the next
state by accumulating our state with f
over those bytes and loop on
IOfoldl
with our f and the newly computed state.
The last thing we need to do is to compose together our countChar
function with IOfoldl
and print out the results to the user.
First let's add a way to show the word count state to the user. Below
our definition of the WordCount
structure add the following:
Let's define one more function in WordCount.lean
:
instance : ToString WordCount where
toString wc := s!"Characters: {wc.charCount} / Words: {wc.wordCount} / Lines: {wc.lineCount}"
Lean 4 has type-classes much like Haskell. We will learn more about
them later but for now this is how you define a common one to convert
a value of a type to a String
representation. It uses Lean 4's
string interpolation feature which is nice and concise.
Finally we can define run
:
def run : IO Unit := do
let wc <- IOfoldl countChar emptyWordCount
IO.println wc
And update our Main.lean
to look like this:
import WordCount
def main : IO Unit := run
We can then go to the command line and build our project from the project root:
$ lake build
And we can run it like this:
$ echo "The quick brown fox" | ./build/bin/WordCount
Characters: 19 / Words: 4 / Lines: 1
Which we can observe is similar to the results we would get with the
traditional wc
program packaged with most Unix-like systems. What
may vary is the value of the "line count." Depending on which wc
you use this will be 0
or 1
but neither is wrong. In our program
we decided that even if we do not encounter a newline character, we
consider there to always be at least 1 line of input.
We can write simple, standard Unix-like programs in Lean 4.
We learned how to define structure
data structures, pure functions,
partial functions, and how to read and write to streams in IO
.
Our program is also rather concise. It is also not too far off in
performance compared to my systems wc
program! In my totally
non-scientific benchmark, I use
https://www.gutenberg.org/files/2701/2701-0.txt as input for both
programs.
My system's wc
returns:
$ cat ~/Downloads/moby_dick.txt | time wc
22316 215864 1276235
0.01user 0.00system 0:00.01elapsed 94%CPU (0avgtext+0avgdata 2120maxresident)k
0inputs+0outputs (0major+87minor)pagefaults 0swaps
And our Lean 4 version:
$ cat ~/Downloads/moby_dick.txt | time ./build/bin/WordCount
Characters: 1276235 / Words: 215864 / Lines: 22317
0.03user 0.00system 0:00.04elapsed 100%CPU (0avgtext+0avgdata 9908maxresident)k
0inputs+0outputs (0major+1251minor)pagefaults 0swaps
This is on a Intel© Core™ i5-5300U CPU @ 2.30GHz × 2
with 8GB of RAM
on 5.4.0-89-generic
of Linux.
We get this without any optimization efforts on our part. Not too
bad! Notice we do get a tiny bit more memory usage, to be expected as
we're doing some work with reference counting that my system wc
doesn't do. Also worth noting that this isn't a POSIX compliant
implementation of wc
and isn't doing quite as much work to handle
different input types and options that would be required. However it
seems reasonable that with a little more effort we could be POSIX
compliant and maintain a respectable amount of performance.