RISC-V assembler RP2350 extensions


#1

The RP2350 Hazard3 RISC-V core designed by Luke Wren extends the base 32-bit RISC-V instruction set with a number of RISC-V extensions. To my mind the most interesting of these to uLisp users are the Zbb, Zbs, and Zbkb extensions which provide bit manipulations and single bit instructions which could be particularly useful in embedded and electronics applications. I’ve defined an additional RISC-V extensions file to allow you to add support for these to the RISC-V assembler.

Updated on 22nd October 2024 to correct some mistakes, and add a bit more information.

Loading the RISC-V extensions

To add the extensions load the standard assembler file first, followed by the extensions file, because some of the extensions add compressed versions of the instructions in the main file.

Get the standard assembler file here: RISC-V assembler in uLisp.

Get the extensions here: RISC-V RP2350 extensions.

For a summary of the extensions see: Lisp RISC-V assembler RP2350 extensions

Note that these extensions won’t work on the Kendryte K210 RISC-V processor used on the Sipeed MAiX boards, which is also supported by the RISC-V assembler in uLisp.

Examples

It’s not obvious what some of these extensions might be useful for, so the following examples demonstrate some possible applications:

Reverse bits – brev8 and rev8

This is a function to efficiently reverse the order of bits in a 32-bit number. The reverse-bits operation could be useful when transforming bitmap images, or when interfacing between protocols that work MSB first and LSB first. It takes advantage of the brev8 instruction that reverses the bits within each byte, and the rev8 instruction that reverses the order of the bytes:

(defcode reverse-bits (n)
  ; Reverse bits within each byte
  ($brev8 'a0 'a0)
  ; Reverse all bytes
  ($rev8 'a0 'a0)
  ($ret))

For example:

> (format t "~b" (reverse-bits #b10110011100011110000111110000011))
11000001111100001111000111001101

Maximum number in a list - max

The following example demonstrates the use of the max instruction that returns the maximum of two signed integers. It finds the largest integer in a list of arbitrary length:

(defcode maximum (x)
  ($lui 'a2 #x80000)
  repeat
  ($beqz 'a0 finished)
  ($lw 'a1 0 '(a0))
  ($lw 'a1 4 '(a1))
  ($max 'a2 'a1 'a2)
  ($lw 'a0 4 '(a0))
  ($j repeat)
  finished
  ($mv 'a0 'a2)
  ($ret))

For example:

> (maximum '(23 -91 47 -73 11))
47

It iterates through the list keeping track of the largest value found so far. Obviously you can also use min to find the smallest value.

Integer square root - clz

The new clz instruction counts the number of leading zeros of a register. It provides an easy way of getting upper and lower bounds for the integer part of the square root of a number. These are useful for applications such as finding prime numbers, where the upper bound gives the largest factor you need to test. If a more accurate result is needed, these bounds can be used as the starting point for Newton’s method, or a binary search.

The algorithm takes advantage of the fact that the length of the binary representation of a number’s integer square root is approximately half that of the original number.

Here is the upper bound routine, upper-sqrt:

(defcode upper-sqrt (x)
  ($li 'a1 33)
  ($li 'a2 1)
  ($clz 'a0 'a0)
  ($sub 'a0 'a1 'a0)
  ($srli 'a0 'a0 1)
  ($sll 'a0 'a2 'a0)
  ($addi 'a0 'a0 -1)
  ($ret))

For example:

> (upper-sqrt 9)
3

> (upper-sqrt 1000000)
1023

> (upper-sqrt 1600000000)
65535

A compact representation for unsigned integers - clz and ror

A 32-bit unsigned integer has a range of 0 to 2^(32-1) and precision of 1 in 2^32. Is it possible to devise a more compact 16-bit floating-point format that will represent the same range, but with reduced precision? This might be useful, for example, to log the values from an analogue-to-digital converter with limited storage.

The solution is to normalize the 32-bit unsigned integer, by shifting it left until the most significant bit is a ‘1’. Then store the number in a 16-bit halfword with the top five bits (E, the exponent) giving the amount of the shift, and the bottom 11 bits (F, the fractional part) giving the top 11 bits of the normalized number.

A number N is then: N = F × 2^(11-E).

The range is still 0 to 2^(32-1) but the precision is 1 in 2^11. I’ve called this format ufloat16 .

Here’s the routine to encode a 32-bit unsigned integer, which is another application of the clz (count leading zeros) instruction:

(defcode to-ufloat16 (n)
   ; Normalize
   ($clz 'a1 'a0)
   ($andi 'a1 'a1 #x1f)
   ($sll 'a0 'a0 'a1)
   ; Shift back down to bottom 11 bits
   ($srli 'a0 'a0 21)
   ; Shift result of clz to top 5 bits
   ($slli 'a1 'a1 11)
   ; Pack into 16 bits
   ($or 'a0 'a1 'a0)
   ($ret))

Here’s the routine to unpack an integer in ufloat16 notation, which uses the new ror (rotate right) instruction:

(defcode from-ufloat16 (n)
   ; Get the exponent from top 5 bits
   ($srli 'a1 'a0 11)
   ; Get the fraction from bottom 11 bits
   ($li 'a2 #x7ff)
   ($and 'a0 'a0 'a2)
   ; Shift up/down by the exponent
   ($addi 'a1 'a1 11)
   ($ror 'a0 'a0 'a1)
   ($ret))

Here are some examples (using a Lisp format statement to print the results in hexadecimal where appropriate).

Numbers up to 2048 are encoded without loss of precision:

> (format t "#x~4,'0x" (to-ufloat16 1))
#xfc00

> (from-ufloat16 #xfc00)
1
> (format t "#x~4,'0x" (to-ufloat16 2048))
#xa400

> (from-ufloat16 #xa400)
2048

Numbers over 2048 have 11-bits precision:

> (format t "#x~4,'0x" (to-ufloat16 4661))
#x9c8d

> (from-ufloat16 #x9c8d)
4660

up to the maximum unsigned 32-bit number #xffffffff:

> (format t "#x~4,'0x" (to-ufloat16 #xffffffff))
#x07ff

> (format t "#x~8,'0x" (from-ufloat16 #x07ff))
#xffe00000

You could do something similar to represent signed 32-bit integers in 16 bits by using one of the bits as a sign bit.

Interleaving two integers - zip and unzip

The following example is a way to encode two small integers, such as a pair of coordinates, as a single compact integer. The encoding technique involves expressing the two numbers in binary, and then interleaving the bitstrings, right-aligned, so their bits alternate.

The new zip and unzip instructions are ideal for this application. The zip instruction interleaves the upper and lower half of a register into the odd and even bits of the result, and unzip does the reverse operation.

The function encode takes two integers of 16 bits or less, and interleaves them into a single integer:

(defcode encode (x y)
  ($pack 'a2 'a0 'a1)
  ($zip 'a0 'a2)
  ($ret))

For example:

> (encode 137 73)
24771

The function decode takes a single integer and decodes it into a list of the original two numbers. It uses a machine-code function unzip:

(defcode unzip (x)
  ($unzip 'a0 'a0)
  ($ret))

(defun decode (x)
  (let ((u (unzip x)))
    (list (logand u #xffff) (logand (ash u -16) #xffff)))) 

For example:

> (decode 24771)
(137 73)

The zip instruction is also useful for making double-width characters from bitmap fonts, by doubling each column of pixels.

Binomial random number generator - cpop

The next example shows how to generate random numbers with a binomial distribution. It uses the new cpop instruction (standing for population count) which counts the number of ‘1’ bits in a register.

To get a random number from 0 to 20 with a binomial distribution you can simulate coin tossing by generating a 20-bit random number, and then counting the number of ‘1’ bits. The cpop instruction will do this:

(defcode popcount (n)
  ($cpop 'a0 'a0)
  ($ret))

For example:

> (popcount #b10101010101010101010)
10

The final binomial random number generator is then:

(defun binomial-random ()
  (popcount (random #xfffff)))

Trying it out:

> (dotimes (x 20) (format t "~a " (binomial-random)))
11 9 10 10 9 10 10 8 10 7 11 13 10 15 6 9 11 13 14 13