-
Notifications
You must be signed in to change notification settings - Fork 36
/
arpilisp.s
3185 lines (2297 loc) · 80.1 KB
/
arpilisp.s
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
/* -*- mode: asm; fill-column: 88; tab-width: 8; -*- */
/*
Get Started with Arpilisp
Side note: if this file looks funny, use a text editor that displays a
fixed-width font and uses 8 spaces for tabs.
Arpilisp is the Assembled Raspberry Pi Lisp. This tutorial uses assembly
language, the lowest level programming language, to implement an interpreter for
Lisp, the highest level language. Arpilisp implements Lisp from scratch, with
no help from any libraries and minimal help from the kernel.
Before you use arpilisp, you need to build it. To build arpilisp, you need
a few things:
* A Raspberry Pi running Raspbian
* That's it.
To build arpilisp, type the following in a shell:
gcc -nostdlib -o arpilisp arpilisp.s
This tells the GNU compiler (gcc) to use arpilisp.s, this file that you are
reading, as input, to build an executable file named arpilisp. Later we
cover what the -nostdlib option does.
To use arpilisp itself, type this:
./arpilisp
To quit arpilisp, type Ctrl-D.
References
https://www.raspberrypi.org
https://www.raspbian.org
________________________________________________________________________________
Acknowledgements
Thanks to Richard W.M. Jones, John McCarthy, and Jack W. Crenshaw for showing us
the simplicity and elegance hiding in complex things, and for reminding us that
computers are awesome.
Thanks also to Chris Hinsley and Jay Sissom for feedback.
________________________________________________________________________________
The Shortest Introduction to Lisp
Lisp is one of the oldest programming languages in computing. John McCarthy and
his team implemented the first Lisp interpreter in 1960. They wanted a
programming language for AI research. Since then, he and many others discovered
amazing things about how Lisp can make computers do surprising, interesting
things.
Most people acknowledge Lisp's influence without knowing what it is exactly.
Others are turned off by Lisp's weirdness. Lisp continues to influence
programming languages today. It's not as inaccessible, alien, or irrelevant as
you might believe.
Lisp is an acronym for LISt Processor. A list is a sequence of items. To
specify a list, we start with an opening parenthesis, continue with the things
in the list, and end with a closing parenthesis.
For example, here's a list of ingredients for a salad:
(lettuce tomato oil vinegar)
We can use extra spaces to clarify what we type but Lisp doesn't care about
excessive white space between list items, as long as there is some. And there's
no need to put white space around parentheses. Here are some lists that look
different to us but mean the same to Lisp:
(lettuce tomato oil vinegar )
(lettuce
tomato
oil
vinegar)
A list may itself also contain lists. Here's a list that is different from our
previous list:
((lettuce tomato) (oil vinegar))
While it contains the same items, they are arranged into two sub-lists, the
first for the vegetables (lettuce tomato), the second for the dressing (oil
vinegar).
Here's an empty list:
()
The empty list comes up so much that Lisp has a symbol for it:
nil
A symbol is just a name that represents something. What the symbol represents
depends on you, the programmer. In our list above, lettuce, tomato, oil, and
vinegar are symbols for salad ingredients. We sometimes refer to symbols as
atoms. Unlike lists, atoms are not composed of parts.
In Lisp, lists and atoms are called "S-expressions". We also use just
"expression" to mean the same thing.
The Processor part of LISt Processor means that we program a Lisp
interpreter to manipulate lists. We manipulate lists with functions that accept
S-expression arguments.
Here's a cool thing about Lisp: functions are also written as S-expressions. In
other words, data and programs have the same form.
Side note: A lot of people make a big deal about the interchangeability of Lisp
data and programs. That's certainly remarkable and accounts for a lot of Lisp's
elegance and expressiveness. The irony is that after getting used to this idea,
you find it odd that other programming languages don't have this feature.
To see how functions work, let's take a closer look at nil and the empty list.
To verify that they are the same, we enter this expression in Lisp:
(eq nil () )
The first item in the list of our expression is eq, which is a function. A
function tells Lisp what needs to be done: we want to determine equality. The
remaining items in the list, nil and (), specify the data that we want the eq
function to process.
Lisp responds with:
t
The t symobl means that the result of this expression is true; nil and () are
indeed equal. Using the t symbol to represent "true" is a Lisp convention, while
nil is the conventional value for false.
In Lisp, we say "function application" to mean applying a function to arguments.
In our example, we apply eq to two arguments. The function compares them for
equality, then returns a "t" for true or "nil" for false.
Before Lisp applies a function to its arguments, it first evaluates each
argument. For example, let's take a look at a more complicated expression:
(eq
(eq nil ())
(eq () nil))
In this expression, we give eq these arguments:
(eq nil ())
and
(eq () nil)
We can't determine how to compute the value of the first eq application until we
know the values of its arguments. Behind the scenes, that's what Lisp does. If
we could watch the internal computation of our S-expression, we would see Lisp
first compute then substitute the values of the arguments.
This substitution renders our expression from this:
(eq
(eq nil ())
(eq () nil))
to this:
(eq
t
(eq () nil))
then this:
(eq
t
t)
and finally:
t
Lisp evaluates expressions recursively by computing the values of the deepest
argument expressions before computing arguments that are higher up.
Lisp has some useful built-in functions. An important function is quote. It
takes a single argument and returns it unevaluated.
For example, when we type our salad list into Lisp:
(lettuce tomato oil vinegar)
Lisp returns an error about "lettuce" not being a function.
To make it clear that our salad list is a list and not an applicaton of the
lettuce function, we enter:
(quote (lettuce tomato oil vinegar))
Lisp responds with:
(lettuce tomato oil vinegar)
Lisp wouldn't be useful if we couldn't create our own functions. To describe a
function, we use a lambda expression. A lambda expression takes a list of
parameters then a sequence of S-expressions to evaluate.
Here's an example of a lambda function that accepts a single parameter to
compute its equality with nil:
(lambda (x) (eq nil x))
Each parameter in the parameter list is a symbol that represents an argument at
application time. The sequence of S-expressions in the lambda may refer to
these parameters. In fact, it's good practice to make sure that the
S-expressions in a lambda refer only to the lambda parameters.
Lisp treats the last S-expression in the lambda specially. Its value is the
value that the lambda returns when Lisp applies it.
In our lambda above, (x) is the list of parameters. In this case, we have a
single parameter, x. The S-expression (eq nil x) is the only expression in our
lambda. It's also the last expression, so the lambda returns the value of this
expression when we apply the lambda.
When you enter a lambda by itself:
(lambda (x) (eq nil x))
Lisp returns the lambda, unapplied:
(lambda (x) (eq nil x))
If we want to apply our lambda to an argument, we need to use the same form as a
function application:
(function argument ...)
We just need to replace function with a lambda expression.
For example, if we enter:
((lambda (x) (eq nil x))
(quote (lettuce tomato oil vinegar)))
Lisp applies our lambda like a regular function application by following these
steps:
1. Evaluate the expressions of the arguments.
2. Bind each evaluated argument, in the order it appears in the application, to
each of the parameters in the order that they appear in lambda parameter list.
3. Evaluate each expression in the lambda. When an expression refers to a
parameter, Lisp evaluates it by substituting its bound value from step 2.
4. Return the value of the last expression in the lambda and unbind its
parameters.
In our example above, in step 1, Lisp first evaluates the lone argument:
(quote (lettuce tomato oil vinegar))
which gives:
(lettuce tomato oil vinegar)
In step 2, Lisp binds this evaluated argument to the parameter, x.
For step 3, Lisp evaluates the lambda body, replacing occurrences of x with the
value it is bound to. This:
(eq nil x)
becomes:
(eq nil (lettuce tomato oil vinegar))
Our argument, (lettuce tomato oil vinegar), is not nil, so eq returns nil. Since
this is the only and last expression in the lambda, the lambda application
returns nil. Returning nil for a non-nil argument is ironic until you remember
that nil means false in this case.
The parameter bindings in a lambda application last only during the application
of the lambda. In step 4, Lisp unbinds the lambda's parameters. The x argument
has no value.
So entering this expression after applying our lambda:
x
gives an error about an unbound variable.
Outside of a lambda, we can bind a symbol to a value so that when you enter the
symbol, Lisp returns the value. For example this expression binds "name"
to Valerie:
(define name (quote Valerie))
So entering this expression:
name
evaluates to what you would expect:
Valerie
You can also change an existing binding:
(define name (quote Isabelle))
Bindings in a lambda temporarily override outside bindings. For example:
(define name (quote James))
((lambda (name) name) (quote Rose))
returns:
Rose
Inside the lambda application, the name symbol is bound to Rose. When the
lambda application returns, the previous binding to name is restored, so that
entering this expression:
name
returns this:
James
Of course, you can also bind a lambda to a symbol, which makes the lambda easier
to use if you intend to refer to it frequently. For example, comparison with
nil is something we see enough that we define a handy function for it:
(define null (lambda (x) (eq x nil)))
Now comparison to nil is more convenient:
(null (quote Gus))
returns:
nil
Notice that the define function plays by different rules than a normal function.
Instead of evaluating its first argument, define takes it literally, as if it
were quoted. Therefore by definition (ahem), define isn't a true function. In
Lisp, we call this a "special form". If you paid enough attention earlier, you
noticed that quote and lambda are also special forms. Lambda doesn't evaluate
its list of parameters. And Lisp delays the evaluation of a lambda's expressions
until it applies the lambda.
Another special form is cond, which is short for "conditional". It takes a list
of clauses.
(cond clause1 clause2... clauseN)
Each clause is a list of 2 expressions:
(test result)
If the test expression is true, then cond returns the value of the corresponding
result expression and stops processing subsequent clauses. If the test is
false, cond proceeds to the next clause. It continues doing so until it finds a
test that returns true or there are no more expressions.
For example, to evaluate this expression:
(cond
((eq name (quote Valerie)) (quote funny))
((eq name (quote James)) (quote silly))
(t (quote goodbye)))
Lisp starts with the first clause:
((eq name (quote Valerie)) (quote funny))
which has this test expression:
(eq name (quote Valerie))
Given our most recent binding for name, this test evaluates to false. Lisp
skips to the next clause:
((eq name (quote James)) (quote silly))
The test expression evaluates to true. So Lisp evaluates the second
expression in this clause, which returns:
silly
which becomes the value of our cond expression. Lisp ignores subsequent
clauses, so the clause:
(t (quote goodbye))
doesn't get evaluated.
As a matter of good habit, we always put a final clause in a cond that has t for
a condition clause's test. That way we assure ourselves that a cond test
returns a value that we specify when all other clause tests are false.
Lisp isn't all special forms. In fact, there are only a handful. Most Lisp
built-in functions evaluate arguments normally.
For example, the most frequently-used Lisp functions are car and cdr. For now,
we partly describe what they do, with full details of their awesomeness later.
The car function returns the first item in a list and cdr returns a list without
its first item. Helpful synonyms for these functions are "first" and "rest",
respectively.
For example, let's define our salad list:
(define salad (quote (lettuce tomato oil vinegar)))
Entering this:
(car salad)
returns this:
lettuce
And entering
(cdr salad)
returns
(tomato oil vinegar)
To pick individual items in a list, we combine car and cdr. Entering:
(car (cdr salad))
gives us:
tomato
This expression:
(car (cdr (cdr salad)))
gives us:
oil
Note that car and cdr do not modify the list that is bound to salad. Car
doesn't reduce a list to its first item, it only tells you what that first item
is. Likewise, cdr doesn't remove the first item from a list, it only tells you
what a list is without its first item. So after applying car and cdr to salad,
salad still has its most recent binding. Entering:
salad
still gives us:
(lettuce tomato oil vinegar)
Functions that don't modify bindings other than their own are called "pure
functions", which is another way of saying that you, the Lisp user, don't have
to worry about unintended consequences when applying a function. Function purity
is an important and useful quality in Lisp programming.
So that's most of the basics of Lisp that we implement in arpilisp. We'll cover
the rest as we go.
References
McCarthy, "Recursive Functions of Symbolic Expressions and Their Computation by
Machine, Part I". MIT. 1960.
McCarthy, et al. "Lisp I Programmer's Manual." MIT. 1960.
________________________________________________________________________________
The Shortest Introduction to ARM Assembly Language
Today's computers have ample capacity to let us pile on layers of operating
systems, shells, libraries, frameworks, sandboxes, and containers. So much that
we've successfully buried assembly language completely out of mind. But the
machine is still there, and not as inaccessible, alien, or irrelevant as you
might believe.
The Raspberry Pi uses an ARM processor. The ARM registers that we use in
arpilisp are 32 bits wide. Registers r0 to r12 are general-purpose and registers
r13 to r15 are single-purpose, at least for our purposes. Register r13 is the
stack pointer (sp), r14 is the link register (lr), and r15 the program counter
(pc).
An extra register, the processor status register (apsr), offers a few single-bit
condition flags that optionally record effects of previous instructions.
To manipulate data, like arithmetic, comparison, bit fiddling, and
so on, ARM requires that we use registers. ARM has no instructions to
manipulate data in memory. Before we operate on something, we first need to
load it from memory to a register. In fact, we can't even directly load from or
store to memory. We need to load an address into a register then use that
register to specify the location of a load (ldr) or store (str) operation.
Side note: Yes, loading an address before loading the contents of the address
seems like a chicken and egg situation. It's not, of course. The assembler and
the processor play some convenient tricks to solve this problem for us.
In GNU assembler, constants begin with a pound sign (#). Addresses start with an
equal sign (=).
Labels end with a colon (:). Labels must be unique with one exception: numeric
labels. An assembly file can repeat numeric labels, which makes them handy for
throw-away labels in an assembly procedure. An instruction refers to a
numerical label by post-fixing an "f" or "b", which refers to the matching
numerical label "forward" or "before" the instruction.
ARM, the company, and others specify calling conventions and ABIs, but we'll
ignore them to keep things simple. Arpilisp is for learning Lisp and assembler,
not for conforming.
References
https://en.wikipedia.org/wiki/IBM_704
ARM Limited. "ARMv6-M Architecture Reference Manual." 2010.
________________________________________________________________________________
Targeting the Processor
Let's get started. We begin at the foundation then go on until we get to the
end: a working Lisp interpreter.
We target the ARMv6 instruction set, which works with all variants of the
Raspberry Pi up to the Pi 3.
*/
.arch armv6
/*
________________________________________________________________________________
Procedures in Assembly
These assembler macros tell the assembler, the debugger, and us where a
procedure starts and ends. They also make sure that the assembler aligns
instructions properly and places constant values that our procedure refers to in
a place that the procedure can reach.
Reference
https://community.arm.com/docs/DOC-9652
*/
.macro PROC name
.text
.balign 4
.func \name, \name
\name\():
.endm
.macro ENDPROC
.pool
.endfunc
.endm
/*
________________________________________________________________________________
Errors
We need a way to flag an error. We'll use the overflow (V) flag in the apsr.
This gives us a simple mechanism to set and detect error conditions. Reacting
to an error is as simple as appending the "vs" or "vc" condition code to
affected instructions.
Our error-flagging method is blunt--it clears all condition flags along with
setting or clearing V. As long as we keep that in mind, we won't get stung.
*/
.macro ERRSET errsym
msr apsr_nzcvq, #1<<28
ldr r7, =\errsym
.endm
.macro ERRCLR
msr apsr_nzcvq, #0
.endm
/*
________________________________________________________________________________
Returning boolean results
We use the Z flag in the status register to handle true and false results. When
we set the Z flag (and bluntly clear other conditions), we can test for it with
the EQ condition flag. Otherwise, we clear the status registers to use the NE
condition.
*/
.macro ZSET
msr apsr_nzcvq, #1<<30
.endm
.macro ZCLR
msr apsr_nzcvq, #0
.endm
/*
________________________________________________________________________________
Lisp stack
We need a safe place to store intermediate Lisp values. We use a stack for
this. We cover the idea of "safe" later on.
We reserve some memory for the Lisp stack and r6 for the Lisp stack pointer.
This stack pointer is separate from processor's stack pointer, aka sp, aka r13.
Side note: The Linux kernel conveniently sets up sp for us and reserves some
memory for the process stack when it launches arpilisp.
*/
.equiv LISPSTACKMAX, 1000 /* Number of pointers to Lisp objects. */
.bss
.balign 4
lispstackbottom:
.space LISPSTACKMAX * 4
.equiv lispstacktop, .
/*
Push the Lisp value in r0 onto the Lisp stack. Signal an error if the stack is
full. Generating an error instead of panicking is fine because it lets our
interpreter gracefully recover from exhausted Lisp stack space.
*/
PROC pushlisp
push { r1 }
ldr r1, =lispstackbottom
cmp r6, r1
bne 10f
ERRSET errstackfull
b 999f
10: stmfd r6!, { r0 }
ERRCLR
999: pop { r1 }
mov pc, lr
ENDPROC
/*
Pop from the Lisp stack and store it in r0. Modifies the condition flags.
Panic if we try to pop from an empty stack. We panic instead of generating an
error because a pop requires a previous push. In other words, a mismatched pop
implies a bug in the interpreter's logic, not the Lisp program that it is
interpreting.
*/
PROC poplisp
ldr r0, =lispstacktop
cmp r6, r0
bne 10f
ldr r0, =panicstackempty
b panic
10: ldmfd r6!, { r0 }
mov pc, lr
ENDPROC
/*
________________________________________________________________________________
The mighty cons
Lisp lists are implemented as cons cells. The cons cell is the most elegant and
expressive data type ever discovered in computing. If symbols are the atoms in
the Lisp universe, then cons cells organize the universe into molecules,
galaxies, and everything in between.
A cons cell is a pair of pointers. Incidentally, "pair" is often used as a
synonym for a cons cell.
Remember car and cdr, the functions that return different parts of a
list? It turns out that these weird function names are based on the traditional
names for the pointers in the cons cell.
cons cell
+-----+-----+
| car | cdr |
+-----+-----+
It also turns out that stating that the car and cdr functions operate on lists
is true only at a superficial level. What these functions really do is operate
on cons cells. This makes implementing the car and cdr functions dead easy:
return the pointer to the object stored in the car or cdr part of the cell,
respectively.
In a cell, the car and cdr pointers may each point to another cell, a symbol, or
nil.
For example, here is a cell where the car points to a symbol and the cdr points
to nil.
+-----+-----+
| car | cdr ---> nil
+--|--+-----+
|
v
symbol
For simplicity of implementation, a practical value for nil is 0. We store
nothing useful at memory location 0.
Side note: In fact, a Linux user program cannot store or load location 0. The
ARM Linux kernel doesn't allow it by design. Doing so results in a segmentation
fault, which is an intentional thing.
Reference
https://www.kernel.org/doc/Documentation/arm/memory.txt
*/
.equiv NIL, 0
/*
We keep all our cells in a pool. A cell occupies 2 words = 2 * 4 bytes = 8
bytes.
*/
.bss
.balign 4
.equiv CELLMAX, 10000
cells:
.space CELLMAX * 2 * 4
.equiv cellsend, .
/*
A Lisp program can ignore its obsolete data by simply abandoning it. From a
Lisp programmer's point of view, memory is infinite. In fact, Lisp provides no
explicit way for a program to deallocate memory.
Side note: I know that you know that I know that limitless memory is impossible.
And it's easy to write a Lisp program that quickly exhausts memory. I'm just
saying that limitless memory is part of the Lisp programming model.
In assembly language, memory is definitely limited. We have to meticulously and
precisely manage memory usage. When we no longer need it, we have to carefully
track it to reuse it later.
So it's up to our interpreter to implement automatic memory management. When a
Lisp program runs out of memory, the interpreter must look for abandoned memory
to reuse. For this, McCarthy's team invented the term "garbage collection". The
name stuck.
The first Lisp used mark-sweep garbage collection. So do we because it is
simple to describe and implement. Also, we limit ourselves to collect abandoned
cons cells. We won't bother with abandoned symbols.
There are two ingredients to marking and sweeping: a free list and a root set.
First ingredient: the free list. The free list points to abandoned cells that
are ready to use. When a Lisp program needs a new cons cell, Lisp removes it
from the free list and gives it to the program.
*/
freelist:
.word NIL
/*
When we start the interpreter, we have to build the free list. We just iterate
through the cell pool, pointing each cons cell's cdr to the next cons cell. At
the last cons cell, we point to nil.
When it's initialized, our free list initially looks like this:
+-----+-----+ +-----+-----+ +-----------+
freelist --->| car | cdr --->| car | cdr ---> ... --->| car | cdr --->nil
+-----+-----+ +-----+-----+ +-----+-----+
*/
PROC initfreelist
push { r0-r2, lr }
ldr r0, =cells
ldr r1, =freelist /* Free list starts at the beginning */
str r0, [ r1 ] /* of the cell pool. */
ldr r2, =cellsend
sub r2, r2, #8 /* Stop before the last cell. */
1: mov r1, r0
add r0, r0, #8 /* The next cell. */
str r0, [ r1, #4 ]
cmp r0, r2
bne 1b
pop { r0-r2, pc }
ENDPROC
/*
Second ingredient in garbage collection: the root set. The root set is the
starting point for marking. Lisp follows the cells pointed to by the cells in
the root set, travelling through all referred cells.
In other words, all cells that are reachable from the root set, directly or
indirectly, are safe from garbage collection. Cells that are not reachable are
added to the freelist.
The garbage collector starts by marking all used cells. This is the mark in
mark-and-sweep.
During the mark phase, Lisp could encounter a cell that it has already
marked. In this case, Lisp stops chasing further because we know that
subsequent, reachable cells from a marked cell are already marked.
To mark a cons cell, we must make sure not to corrupt the original data in the
cell. We do this by taking advantage of a quirk in ARM's data alignment
preferences. Genuine ARM pointers are aligned to the nearest word (4 bytes), so
a pointer's 2 least significant bits are always 0.
ARM pointer
+--------+--------+-- --+-------+-------+
| 0 or 1 | 0 or 1 | ... | 0 | 0 |
+--------+--------+-- --+-------+-------+
bit 31 bit 30 bit 1 bit 0
We take advantage of these 2 bits to encode information about a Lisp object in
the actual pointer to the object.
To specify that a cell is in use during the mark phase, we mark its car by
settings bit 1 to 1. We have another use for bit 0, covered later.
We don't need to worry about decoding a marked cell to refer to its pointer for
a few reasons: our Lisp program is suspended during mark and sweep, we don't follow
reachable cells from a marked cell, and we only use the mark bit during the
marking phase of garbage collection.
*/
.equiv MARKMASK, 0b10
PROC collectgarbage
push { r0, r6, lr }
/* Mark the root set. */
mov r0, r9 /* The environment. */
bl mark
mov r0, r8 /* The expression being evaluated. */
bl mark
mov r0, r7 /* The value. */
bl mark
mov r0, r5 /* The list of lambda argument values. */
bl mark
ldr r0, =freelist
ldr r0, [r0]
bl mark
/* Mark the Lisp stack. */
10: ldr r0, =lispstacktop
cmp r6, r0
beq 20f
ldmfd r6!, { r0 }
bl mark
b 10b
20: bl sweep
pop { r0, r6, pc }
ENDPROC
/*
Given a pointer in r0, mark it if applicable.
*/
PROC mark
push { r0-r3, lr }
/* Does r0 point to a cell? */
1: cmp r0, #NIL
beq 999f
tst r0, #SYMMASK
bne 999f
/* Is the car already marked? */
ldr r1, [ r0 ]
tst r1, #MARKMASK
bne 999f
/* Mark the car. */
mov r2, r1 /* Mark with r2, save r1. */
orr r2, r2, #MARKMASK
str r2, [r0] /* Store the mark in car. */
/* Follow the data pointed to by the car and cdr. */
mov r3, r0 /* Save original pointer in r3. */
mov r0, r1 /* Chase after the car. */
bl mark
ldr r0, [ r3, #4 ] /* Chase after the cdr. */
b 1b
999: pop { r0-r3, pc }
ENDPROC
/*
With marking completed, the interpreter starts sweeping. It iterates through
the cell pool, unmarking active cells and inserting inactive cells into the free
list.
*/
PROC sweep
push { r0-r1, lr }
ldr r1, =cells
1: ldr r0, [ r1 ] /* Is the car marked? */
tst r0, #MARKMASK
beq 2f
/* Cell is active, unmark it. */
bic r0, #MARKMASK
str r0, [ r1 ]
b 3f
/* Cell is inactive, add it to freelist. */
2: mov r0, #0 /* Clear the car. */
str r0, [ r1 ]
ldr r0, =freelist /* Store freelist head in the cdr. */
ldr r0, [ r0 ]
str r0, [ r1, #4 ]
ldr r0, =freelist /* Point freelist to the new head. */
str r1, [ r0 ]
/* Next cell in the cell pool. */
3: ldr r0, =cellsend
add r1, r1, #8
cmp r1, r0
bne 1b
pop { r0-r1, pc }
ENDPROC
/*
Lisp provides the cons function to allocate new cells. A Lisp program uses cons
to construct its data. This is where the free list comes in.
+-----+-----+ +-----+-----+ +-----------+
freelist --->| car | cdr --->| car | cdr ---> ... --->| car | cdr --->nil
+-----+-----+ +-----+-----+ +-----+-----+
cell 1 cell 2 cell N
Allocating a cons cell from the freelist is simple. We just "pop" it from the
freelist:
allocated cell
|
V
+-----+-----+ +-----+-----+ +-----------+
freelist -+ | car | cdr | +->| car | cdr ---> ... --->| car | cdr --->nil
| +-----+-----+ | +-----+-----+ +-----+-----+
| cell 1 | cell 2 cell N
| |
+-----------------+
Given pointers to Lisp objects in r1 and r2, return an allocated cell in r0 such
that its car contains r1 and cdr contains r2. Otherwise, panic if there is no
memory or issue an error if the Lisp stack is full.
*/
PROC cons