A Lisp compiler to ARM written in Lisp (2)

johnsondavies · 2025-09-08 06:59:39 UTC

About a year ago I published a simple experimental Lisp compiler, written in uLisp, that compiled a Lisp function into ARM machine code.

This is a development of that compiler, with the following additional features:

Local variables can be created with let and let*.
Loops are supported using loop, dotimes, and return.
In addition to if it now provides the conditional statements when and unless.
The functions first and rest can be used as synonyms of car and cdr.
The function nth can be used to return the nth item of a list.
The integer arithmetic functions / and mod are now supported.
Unary - and not are provided.
The unary predicates zerop, plusp, minusp, oddp, and evenp are supported.
The shift function ash is supported.
It takes advantage of ARM Thumb-2 instructions movw and movt to handle large immediate values, sdiv and mls to provide / and mod, and cbz, cbnz, and it to give more compact code.

Because it uses Thumb-2 instructions, this version of the compiler requires a board with an M4, M33, or later ARM processor. These include boards based on the ATSAMD51, RP2350, nRF52840, or RA4M1. Unlike the original version of the compiler it won’t work on boards with an M0 or M0+ ARM processor such as the ATSAMD21, RP2040, or the nRF51822.

The original article gave a series of example programs that you could compile with the compiler. In this article I give several additional examples that can be compiled by the additional features in this version of the compiler.

Introduction

When I added the facility of executing machine code to uLisp I had in mind the eventual goal of being able to compile uLisp functions into machine code, and this is a first step in that direction.

The nice thing about compiling Lisp is that you don’t have to write a tokeniser or parser, because Lisp programs are already in a consistent structure that can be processed by another Lisp program.

The compiler program is written in the subset of Common Lisp supported by uLisp, and will run on an ARM board with an M4, M33, or later ARM processor, and with at least 5000 objects of workspace. I used a Circuit Playground Bluefruit. You can also run it using Common Lisp on a laptop or desktop computer, and display the code it generates, but of course you won’t be able to run the ARM machine code because Common Lisp doesn’t have uLisp’s defcode command.

I got my initial inspiration for this compiler from Peter Norvig’s book “Paradigms of Artificial Intelligence Programming”.

Here’s the source of this version of the compiler: Lisp compiler for ARM 2.

To use the compiler you also need to load the ARM assembler from: ARM assembler in uLisp.

For more information about the assembler see ARM assembler overview.

Using the compiler

To use this compiler you simply call compile on a Lisp function; for example:

(compile 'factor)

The function will be compiled into a machine code function, replacing the original Lisp code, so that calling factor will now execute the ARM machine-code version.

You can also display the code generated for an expression by calling comp on the expression; for example:

(pprint (comp '(* 13 17)))

(:integer
  ($mov 'r0 13)
  ($push '(r0))
  ($mov 'r0 17)
  ($pop '(r1))
  ($mul 'r0 'r1))

I give examples of several simple Lisp programs that it will successfully compile later in this article, together with a comparison of the speed of the Lisp and machine-code versions.

Before compiling a new function you might want to remove the previous one from memory using makunbound to free up the code memory before compiling the next function; for example:

(makunbound 'factor)

Alternatively you could increase the amount of memory available for machine code by editing the directive such as:

#define CODESIZE 256

before uploading uLisp to your board.

How the compiler works

Register usage

To avoid needing to keep track of register usage the compiler makes use of the stack to pass values to an expression, and store the value returned by an expression.

The following table shows how the ARM registers are used within the compiler:

Registers	Use
r0 r1 r2 r3	Used to pass the parameters to the main function’s arguments.
r0	Contains the value returned by the main function.
r4 r5 r6 r7	Contain copies of the function arguments within the function.
r0 r1	Used to pass the arguments to each operator.
r0	Used to return the value from each operator.
r2	Used to return the true/nil value from comparisons.

Compiling an expression

The following steps show the sequence of compiling an expression, such as:

(* x 13)

Code is generated to evaluate each of the arguments, in this case x and 13, and each result is pushed onto the stack, apart from the last which is left in r0.
The first value is popped from the stack into register r1.
The function, in this case *, is then evaluated for r1 and r0, with the result in r0.

This stack-based approach ensures that a more complex expression, such as:

(* (- x 1) (+ x 13))

will also compile into correct code, without conflicts between registers.

Calling the function recursively

The compiler supports calling a function recursively from within the function itself. Because the registers corresponding to the parameters and local variables would be overwritten by the recursive call they are stored on the stack around the function call.

There are several recursive functions in the examples below.

Types

For boolean operations I decided to represent nil as zero, and t as 1. A problem I hadn’t anticipated was that I would need to keep track of what type of object each function returned, integer or boolean. For example, consider the problem of compiling the statement:

(and x y)

If x has the value 0 and y has the value 7 this should return 7. However, if x has the value nil and y has the value 7 this should return nil. Representing nil as zero leads to an ambiguity.

I solved this by returning a type, :integer or :boolean, with each compiled expression, according to the following rules:

Predicates, and t or nil, always return a :boolean.
Arithmetic operations always return an :integer.
An if form requires a :boolean test form and returns an :integer.
A progn or let block returns the type of its last expression.

An item with an ambiguous type returns the type nil.

The compiler gives a warning if a function seems to use incorrect types.

Examples

I used the following simple examples to test the new features in this version of the compiler. The compiler will also compile all the examples in the earlier article: A Lisp compiler to ARM written in Lisp.

To compile a Lisp function you simply give the command compile followed by the name of the function; for example, to compile the factor function:

(compile 'factor)

Factor

This function takes a simple approach to finding the least prime factor of a number by testing candidate factors up to the square root of the number. It uses the new features of the compiler: let, loop, return, when, and mod:

(defun factor (n)
  (let ((d 2) (i 1))
    (loop
     (when (> (* d d) n) (return n))
     (when (zerop (mod n d)) (return d))
     (setq d (+ d i)) (setq i 2))))

Lisp version:

> (time (factor 2147302777))
46327
Time: 5.3 s

Compiled version:

> (time (factor 2147302777))
46327
Time: 24 ms

Subset Sum problem

This is an NP-hard problem that can only be solved by an exhaustive search (see Subset sum problem on Wikipedia). The problem is as follows: given a list of integers and a target total, find whether the target can be reached by summing a subset of the integers.

The following recursive Lisp program solves the problem (see Recursive solution to Subset Sum on Lispology). It uses the new features of the compiler: nth and zerop :

(defun subsetsum-p (lis n sum)
  (if (zerop sum) t
    (if (zerop n) nil
      (or (subsetsum-p lis (1- n) sum)
          (subsetsum-p lis (1- n) (- sum (nth (1- n) lis))))))))

To test the function I used this list:

(defvar *lis* '(2 4 6 8 10 12 14 16 18 20 22 24 26 28 30))

Lisp version:

> (time (subsetsum-p *lis* (length *lis*) 200))
t
Time: 8.4 s

> (time (subsetsum-p *lis* (length *lis*) 199))
nil
Time: 16.9 s

Here I’ve contrived the list to only contain even numbers, so we can be sure that 199 will be unreachable.

Compiled version

The compiled version returns the numbers 0 for nil and 1 for t:

> (time (subsetsum-p *lis* (length *lis*) 200))
1
Time: 74 ms

> (time (subsetsum-p *lis* (length *lis*) 199))
0
Time: 74 ms

Exponentiation

The following function calculates x^y. The result must be a 32-bit integer. It uses the following new features: let, loop, return, zerop, oddp, and ash:

(defun iex (x y)
  (let ((e 1))
    (loop
     (when (zerop y) (return e))
     (when (oddp y) (setq e (* e x)))
     (setq x (* x x))
     (setq y (ash y -1)))))

Lisp version:

> (time (iex 7 11))
1977326743
Time: 1 ms

Compiled version

> (time (iex 7 11))
1977326743
Time: 0 ms

Reverse digits

This function reverses the digits of an integer:

(defun reversedigits (n)
  (let ((a 0))
    (loop
     (setq a (+ (* 10 a) (mod n 10)))
     (setq n (truncate n 10))
     (when (zerop n) (return a)))))

It uses the new features: let, loop, return, zerop, truncate, and mod:

Lisp version:

> (time (reversedigits 123456789))
987654321
Time: 2 ms

Compiled version

> (time (reversedigits 123456789))
987654321
Time: 0 ms

Compiler source

Here’s the source of this version of the compiler: Lisp compiler for ARM 2.

To use the compiler you also need to load the ARM assembler from: ARM assembler in uLisp.

The Thumb-2 extensions are included with the compiler.

johnsondavies · 2025-09-10 07:24:53 UTC

Here’s an example showing the code it generates for a recursive implementation of the Fibonacci series:

(defun fibonacci (n)
  (if (< n 3) 1
    (+ (fibonacci (- n 1)) (fibonacci (- n 2)))))

> (time (fibonacci 27))
196418
Time: 51.5 s

> (compile 'fibonacci)
0000      fibonacci
0000 b510 ($push '(lr r4))
0002 0004 ($mov 'r4 'r0)
0004 0020 ($mov 'r0 'r4)
0006 b401 ($push '(r0))
0008 2003 ($mov 'r0 3)
000a bc02 ($pop '(r1))
000c 4281 ($cmp 'r1 'r0)
000e bfb4 ($it 'e 'lt)
0010 2001 ($mov 'r0 1)
0012 2000 ($mov 'r0 0)
0014 b108 ($cbz 'r0 label1)
0016 2001 ($mov 'r0 1)
0018 e010 ($b label2)
001a      label1
001a 0020 ($mov 'r0 'r4)
001c b401 ($push '(r0))
001e 2001 ($mov 'r0 1)
0020 bc02 ($pop '(r1))
0022 1a08 ($sub 'r0 'r1 'r0)
0024 f7ff ($bl fibonacci)
0026 ffec 
0028 b401 ($push '(r0))
002a 0020 ($mov 'r0 'r4)
002c b401 ($push '(r0))
002e 2002 ($mov 'r0 2)
0030 bc02 ($pop '(r1))
0032 1a08 ($sub 'r0 'r1 'r0)
0034 f7ff ($bl fibonacci)
0036 ffe4 
0038 bc02 ($pop '(r1))
003a 1808 ($add 'r0 'r1 'r0)
003c      label2
003c bd10 ($pop '(pc r4))
fibonacci

> (time (fibonacci 27))
196418
Time: 262 ms

eliot · 2025-09-10 17:17:11 UTC

I learned the author has an open source repository with the book and the included code.

Here’s the download link for the newest release.

https://github.com/norvig/paip-lisp/releases/download/v1.3/PAIP-6th-ed-with-toc.pdf (6th edition, 2001 - Scanned in June 2022 with table of contents added. PDF 42.9 MB)

Chapter 23: “Compiling Lisp” demonstrates a Scheme compiler that targets a hypothetical stack machine with a minimal set of instructions.

So far we have defined the instruction set of a mythical abstract machine and generated assembly code for that instruction set. It’s now time to actually execute the assembly code and hence have a useful compiler.

There are several paths we could pursue: we could implement the machine in hardware, software, or microcode, or we could translate the assembly code for our abstract machine into the assembly code of some existing machine.

An abstract machine is a cool concept. It implies a universal compiler that targets a virtual “machine made of code” with a common set of instructions; then having multiple assemblers that translate it to emit machine code for specific hardware.

With the uLisp compiler to ARM, I was curious why there’s a separate compiler and an assembler. As I read more of the PAIP book above, I think I’m starting to understand. If I get it correctly, the compiler takes a Lisp program and generates another Lisp program. The latter is written in a kind of domain-specific language of assembly instructions, like $mov, $push, $pop. When it’s called, it emits binary machine code specific to ARM hardware - and runs it.

With my interest in WebAssembly, which has a fairly small set of instructions, it makes me dream of a Lisp to Wasm compiler. But I think the Wasm architecture doesn’t allow for dynamically generated executable code - something about code and data being in separate memory address spaces.

In just-in-time (JIT) code generation, the program generates bytes for the instruction set of the machine it’s running on, and then transfers control to those instructions.

Usually the program has to put its generated code in memory that is specially marked as executable. However, this capability is missing in WebAssembly.

… Here the answer is call_indirect. The key idea here is that to add code, the main program should generate a new WebAssembly module containing that code. Then we run a linking phase to actually bring that new code to life and make it available.

– just-in-time code generation within webassembly

The linked article (and source code in C++) demonstrates a Scheme to WebAssembly compiler. Looks tricky though.

A different approach I’ve been thinking of, is a compiler that converts a uLisp program to WAT (WebAssembly Text Format) represented by S-expressions, that seems easier to generate; then use wat2wasm (source and demo) to convert it to binary, to run or export as file.

Here’s a recursive factorial function, for example. Looks similar to what the ARM assembler is doing, like push/pop values from the stack.

(module
  (func $fac (export "fac") (param f64) (result f64)
    local.get 0
    f64.const 1
    f64.lt
    if (result f64)
      f64.const 1
    else
      local.get 0
      local.get 0
      f64.const 1
      f64.sub
      call $fac
      f64.mul
    end))

I found a chapter in Structure and Interpretation of Computer Programs, where it discusses compilers.

In this section, we explore the alternative strategy of compilation. A compiler for a given source language and machine translates a source program into an equivalent program (called the object program) written in the machine’s native language.

The compiler that we implement in this section translates programs written in Scheme into sequences of instructions to be executed using the explicit-control evaluator machine’s data paths. Compared with interpretation, compilation can provide a great increase in the efficiency of program execution.

– https://sarabander.github.io/sicp/html/5_002e5.xhtml#g_t5_002e5

That makes sense, compilation is a type of code generation - a program to transform programs.

It’s kind of magical how much simpler a Lisp compiler written in Lisp can be, compared to how big and complex an equivalent C parser + compiler written in C would be. This compiler implementation and the imformative post makes me appreciate the unique benefits of the language.

Especially love how uLisp and the assembler lets you get down to the bare metal. The speed improvements of the compiled functions are impressive, some of them are like 200 times faster.