An ARM assembler written in Lisp


You can write machine-code functions in uLisp with the help of the ARM assembler written in Lisp, and I’ve recently updated it to make it more compact. It will now fit on a board with about 2000 objects of workspace, with room to spare to write assembler programs and run them.

This post describes how the latest version of the ARM assembler works. The aim is to help anyone who wants to extend the assembler to cater for ARM instructions that it doesn’t currently support. It will also be helpful if you want to write an assembler for another processor, or even design your own processor and write an assembler for it; Lisp is an excellent language to do this. For example, a printout of the whole ARM assembler fits on two A4 pages.

Instruction encodings

The starting point for writing an assembler is to get hold of a summary of the processor’s table of instruction encodings. For the ARM Thumb instruction set these are as follows:

ARM Thumb instruction encodings for instructions starting #x0 to #x8.

ARM Thumb instruction encodings for instructions starting #x9 to #xF.

You can see from these diagrams that the 16-bit instructions are arranged into consistent field patterns. This is true of most processor instruction sets, but some are more orderly than others (RISC-V is a nightmare!).

An example - LSL

As an example, consider the first instruction in the first table, LSL (Logical Shift Left) immediate:

This consists of:

  • The four-bit value #b0000.
  • A one-bit op code, which is 0 for LSL and 1 for LSR.
  • An immed5 value, which is a 5-bit integer from 0 to 31 giving the size of the left shift.
  • Lm, which is a value from 0 to 7 representing the source register R0 to R7.
  • Ld, which is a value from 0 to 7 representing the destination register R0 to R7.

Emitting bit fields

The first function we need is emit, which takes a specification defining the widths of the bit fields, and a list of arguments, and packs the values of the arguments into the bit fields:

(defun emit (bits &rest args)
  (let ((word 0) (shift -28))
    (mapc #'(lambda (value)
              (let ((width (logand (ash bits shift) #xf)))
                (incf shift 4)
                (unless (zerop (ash value (- width))) (error "Won't fit"))
                (setq word (logior (ash word width) value))))

The first argument, bits, is a 32-bit hexadecimal number in which each hex digit specifies the width of the next bit field. The function emit reads the hex digits in bits from left to right, packs the appropriate number of bits from each argument into word, and then returns the result.

For example, the bit fields for the LSL instruction could be specified by:


To make it easier to process the bit fields the widths are left-aligned, so you should add zeros to make the bits parameter eight hex digits.

The remaining arguments are the values to be packed into the bit fields. If any argument won’t fit into the corresponding bit field the error Won’t fit will be displayed.

So for example, to emit the op code for the instruction:

LSL r7, r4, #31


> (emit #x41533000 0 0 31 4 7)

If you print this as a 16-bit binary number with:

> (format t "~16,'0b" 2023)

you can see that the values have been put into the correct fields as required.

Specifying registers

The next step is to be able to specify registers as r0 to r15, or their synonyms sp (for r13), lr (for r14), and pc (for r15). This is handled by the function regno:

(defun regno (sym)
  (case sym (sp 13) (lr 14) (pc 15)
    (t (read-from-string (subseq (string sym) 1)))))

For example:

> (regno 'r12)

Finally, we can now define the LSL instruction as the convenient Lisp function $lsl as follows:

(defun $lsl (argd argm immed5)
  (emit #x41533000 0 0 immed5 (regno argm) (regno argd))

This allows us to specify the instruction using syntax that’s close to ARM assembler syntax:

> ($lsl 'r7 'r4 31)

I’ve used the convention that functions representing ARM instructions are prefixed by a $ sign; otherwise there would be a problem with instructions that are also existing Lisp functions, such as push and pop.

Handling addressing modes

The final complication is that some instruction mnemonics can generate different op codes, depending on the types of their arguments.

For example, there’s also a variant of LSL that shifts a register Rd by the shift value specified in the register Rs:

Using this syntax, the following assembler instruction shifts the value in R7 by the value in R1:

LSL r7, r1

The block of register-to-register instructions that include LSL is handled by the routine reg-reg:

(defun reg-reg (op argd argm)
  (emit #xa3300000 op (regno argm) (regno argd)))

Finally, we need to modify $lsl to include the register-to-register variant:

(defun $lsl (argd argm &optional arg2)
   ((numberp arg2)
    (lsl-lsr-0 0 arg2 argm argd))
   ((numberp argm)
    (lsl-lsr-0 0 argm argd argd))
    (reg-reg #b0100000010 argd argm))))

where lsl-lsr-0 is defined as:

(defun lsl-lsr-0 (op immed5 argm argd)
  (emit #x41533000 0 op immed5 (regno argm) (regno argd)))

This expanded version of $lsl also handles the two-argument case where the source and destination registers are the same in an immediate shift; for example:

($lsl 'r1 31)

Running the assembler

To run the assembler in uLisp you use the built-in command defcode, which generates an assembler listing, and puts the machine code into RAM so you can execute it as if it’s a normal Lisp function.

Greatest Common Divisor example

For example, to assemble a machine-code routine gcd to calculate Greatest Common Divisor you’d evaluate:

; Greatest Common Divisor
(defcode gcd (x y)
  ($mov 'r2 'r1)
  ($mov 'r1 'r0)
  ($mov 'r0 'r2)
  ($sub 'r2 'r2 'r1)
  ($blt swap)
  ($bne again)
  ($bx 'lr))

and you could then call:

> (gcd 3287 3460)

Running the assembler in Common Lisp

You can also run the ARM assembler in a standard Common Lisp implementation. The Common Lisp version of the ARM Assembler includes the following defcode macro that lets you assemble an ARM function and print the machine code, like the defcode special form built into uLisp:

(defparameter *pc* 0)

(defmacro defcode (&body body)
  (let ((*print-pretty* t) (assembler (cddr body)))
    (dotimes (pass 2)
      (setq *pc* 0)
       #'(lambda (ins)
            ((atom ins)
             (unless (zerop pass) (format t "~4,'0x      ~(~a~)~%" *pc* ins))
             (set ins *pc*))
            ((listp (eval ins))
             (unless (zerop pass)
               (format t "~4,'0x ~4,'0x ~(~a~)~%" *pc* (first (eval ins)) ins)
               (format t "~4,'0x ~4,'0x~%" (+ *pc* 2) (second (eval ins))))
             (incf *pc* 4))
             (unless (zerop pass)
               (format t "~4,'0x ~4,'0x ~(~a~)~%" *pc* (eval ins) ins))
             (incf *pc* 2))))

Evaluating the Greatest Common Divisor example above generates the following output:

0000      swap
0000 000A ($mov 'r2 'r1)
0002 0001 ($mov 'r1 'r0)
0004      again
0004 0010 ($mov 'r0 'r2)
0006 1A52 ($sub 'r2 'r2 'r1)
0008 DBFA ($blt swap)
000A D1FB ($bne again)
000C 4770 ($bx 'lr)

In this case you obviously won’t be able to run the machine code.


For both versions of the assembler see:

For more information see ARM assembler overview.

For a list of the ARM Thumb instructions supported by the assembler see: ARM assembler instructions.

For ARM assembler examples see: ARM assembler examples.


6th July 2023: Updated the defcode macro to handle forward references.



Thanks a lot for sharing.

Maybe a dummy question. Looks like this can be use for building “your own compiler”?

Let’s say that you would like to create a Structure Text (ST) compiler to run code in the Raspberry PI pico. Like a PI Pico Based PLC.

Would the following pipeline made sense?

ST --> Lisp Compiler --> Lisp ARM --> Machine Running Code

Or will be better:

ST --> Lisp Compiler --> uLisp

Best Regards