A Lisp compiler for ARM written in Lisp

johnsondavies · 2024-08-06 18:52:31 UTC

Introducing a simple experimental Lisp compiler, written in uLisp, that will compile a Lisp function into ARM machine code. My aim was to make the compiler simple enough so that its listing will fit on a single sheet of A4 paper.

For a set of example programs, and a commented listing of the compiler, see the full article: A Lisp compiler for ARM written in Lisp.

Introduction

When I added the facility of executing machine code to uLisp I had in mind the eventual goal of being able to compile uLisp functions into machine code, and this is a first step in that direction.

The nice thing about compiling Lisp is that you don’t have to write a tokeniser or parser, because Lisp programs are already in a consistent structure that can be processed by another Lisp program.

The compiler program is written in the subset of Common Lisp supported by uLisp, and will run on an ARM board with at least 5000 objects of workspace. I used a Seeed Studio XIAO nRF52840. You can also run it using Common Lisp on a laptop or desktop computer, and display the code it generates, but of course you won’t be able to run the ARM machine code because Common Lisp doesn’t have uLisp’s defcode command.

Although I wrote this for ARM boards it should be relatively straightforward to convert it to generate AVR or RISC-V machine code.

Using the compiler

To use this compiler you simply call compile on a Lisp function; for example:

(compile 'fibonacci)

The function will be compiled into a machine code function, replacing the original Lisp code, so that calling fibonacci will now execute the ARM machine-code version.

You can also display the code generated for an expression by calling comp on the expression; for example:

> (pprint (comp '(* 13 17)))

(:integer
  ($mov 'r0 13)
  ($push '(r0))
  ($mov 'r0 17)
  ($push '(r0))
  ($pop '(r0 r1))
  ($mul 'r1 'r0)
  ($push '(r1)))

Specification

The compiler understands the following Lisp commands:

Defining a function: defun

Symbols: nil, t

Arithmetic functions: +, -, *, logand, logior, logxor

Arithmetic comparisons: =, <, <=, >, >=, /=

Conditionals: if, and, or

Evaluation: progn, setq

Limitations

This first version of the compiler has a few limitations, including:

It can only compile functions of up to four integer arguments, and returning an integer result.
It assumes that all arithmetic and logical functions take exactly two arguments.
It assumes that the function you are compiling is small enough so that branch instructions can be used for jumps within the function.
There is relatively little error checking.
It generates inefficient code compared to what you could write by hand.

Some of these limitations would be simple to remedy; some of them are more fundamental.

To use the compiler you first need to load the ARM assembler, written in Lisp, from: ARM assembler in uLisp.

For more information about the assembler see ARM assembler overview.

How the compiler works

Register usage

To avoid needing to keep track of register usage the compiler makes use of the stack to pass values to an expression, and store the value returned by an expression.

The following table shows how the ARM registers are used within the compiler:

Registers	Use
r0 r1 r2 r3	Used to pass the parameters to the main function’s arguments.
r0	Contains the value returned by the main function.
r4 r5 r6 r7	Contain copies of the function arguments within the function.
r0 r1	Used to pass the arguments to each operator.
r1	Used to return the value from each operator.
r2	Used to return the true/nil value from comparisons.

Compiling an expression

The following steps show the sequence of compiling an expression, such as:

(* x 13)

Code is generated to evaluate each of the arguments, in this case x and 13, and each result is pushed onto the stack.
The two values are then popped from the stack into registers r0 and r1.
The function, in this case *, is then evaluated for r1 and r0, with the result in r1.
This result in r1 is then pushed onto the stack.

This stack-based approach ensures that a more complex expression, such as:

(* (- x 1) (+ x 13))

will also compile into correct code, without conflicts between registers.

Calling the function recursively

The compiler supports calling a function recursively from within the function itself. Because the registers corresponding to the parameters and local variables would be overwritten by the recursive call they are stored on the stack around the function call.

Types

For boolean operations I decided to represent nil as zero, and t as 1. A problem I hadn’t anticipated was that I would need to keep track of what type of object each function returned, integer or boolean. For example, consider the problem of compiling the statement:

(and x y)

If x has the value 0 and y has the value 7 this should return 7. However, if x has the value nil and y has the value 7 this should return nil. Representing nil as zero leads to an ambiguity.

I solved this by returning a type, :integer or :boolean, with each compiled expression, according to the following rules:

Predicates, and t or nil, always return a :boolean.
Arithmetic operations always return an :integer.
A if form requires a :boolean test form and returns an :integer.
A progn or let block returns the type of its last expression.

Examples

Here is one of the example programs I used to test the compiler:

Takeuchi function

This is the standard highly-recursive benchmark I use for comparing versions of Lisp; see Benchmarks - Takeuchi function:

(defun tak (x y z)
  (if (>= y x)
      z
    (tak
     (tak (- x 1) y z)
     (tak (- y 1) z x)
     (tak (- z 1) x y))))

Lisp version:

> (time (tak 18 12 6))
7
Time: 8.9 s

Compiled version

> (time (tak 18 12 6))
7
Time: 68 ms

For further example programs, and a commented listing of the compiler, see the full article: A Lisp compiler for ARM written in Lisp.

fitzsim · 2024-08-06 20:07:54 UTC

Wow, so cool, great work! You could also implement disassemble.

JackBeNimble · 2024-08-06 22:17:24 UTC

Awesome, thanks so much! I will have to give it a try. Risc-V for sure, the $3 chip needs a lisp compiler to run some simple apps in 4k flash.

andreer · 2024-08-11 18:36:41 UTC

Very nice!

johnsondavies · 2024-08-21 11:59:24 UTC

I’ve released a Version 2 of the compiler which has the following improvements over the original, while keeping the size of the program unchanged:

It compiles expressions to leave the result in a register (r0) rather than push it on the stack, which results in more efficient code.
It includes support for the following additional Lisp functions: 1+, 1-, car, and cdr.
There is better error checking.

I’ve also included some additional example programs.

See: A Lisp compiler for ARM written in Lisp.

AnatolySch · 2024-08-23 20:59:27 UTC

Fantastic! The test results are shocking. A great achievement.

johnsondavies · 2024-08-28 14:24:35 UTC

Here’s a small improvement to the Lisp compiler, with a discussion.

If you try to compile a function such as:

(defun tst (x) (setq x 256))

you’ll get the error from the Lisp ARM assembler:

Error: Won't fit

The reason is that the compiler is invoking the ARM Thumb-1 instruction:

MOV r0, #256

which is invalid because the immediate value can only be in the range 0 to 255. So what are the most efficient solutions for supporting constants k up to 32 bits wide?

The most efficient solution for constants between 0 and 255 is the MOV instruction as already used (2 bytes):

MOV r0, #k    ; k is 0 (0x00000000) to 255 (0x000000FF)

For constants between -1 and -255 we can solve the problem in 4 bytes by combining the MOV instruction with a Thumb-1 NEG (negate) register-to-register instruction:

MOV r0, #-k    ; k is -1 (0xFFFFFFFF) to -255 (0xFFFFFF01)
NEG r0, r0

For immediate values of up to 32 bits we can use the Thumb-1 PC-relative LDR instruction, put the 32 bit value in-line in the code, and branch around it, which needs 8 bytes:

	   LDR r0, const
	   B jump
const  .word k
jump   ...

If you find this a bit inelegant, on processors supporting Thumb-2 you can cater for 32-bit constants by specifying 16 bits at a time with a MOVW instruction followed by a MOVT instruction, but there’s really no advantage as it’s still a total of 8 bytes):

MOVW r0, #klow   ; k is 0 (0x00000000) to 65535 (0x0000FFFF)
MOVT r0, #khigh  ; k is 0 (0x00000000) to 65535 (0xFFFF0000)

To implement this improvement make the following two changes to the Lisp compiler source at:

Lisp compiler for ARM

Replace this line in comp:

((atom x) (type-code :integer (list (list '$mov ''r0 x))))

with:

((atom x) (comp-integer x env))

and add comp-integer defined as follows:

(defun comp-integer (x env)
  (type-code
   :integer
   (cond
    ((<= 0 x 255) (list (list '$mov ''r0 x)))
    ((<= -255 x -1) (list (list '$mov ''r0 (- x)) '($neg 'r0 'r0)))
    (t (list '($ldr 'r0 (+ *pc* 4)) '($b (+ *pc* 6)) (list '$word x))))))

which implements the above strategy. It avoids the need for the two labels by using PC-relative operands.