RP2350 RISC-V vs ARM performance


#1

The RP2350 on the Raspberry Pi Pico 2 has one pair of ARM M33 cores, and one pair of Hazard3 RISC-V cores. Since uLisp supports both ARM and RISC-V, I’ve been wondering which CPU is faster for use with uLisp.

Looking at the uLisp Performance page, I had assumed that uLisp on the RISC-V core would be running significantly slower than on the ARM core. These are the listed numbers for the Raspberry Pi Pico 2:

  • ARM: 5.0 ms, 2.5 s, 5.4 s (gc time, Tak, Q2)
  • RISC-V: 4.3 ms, 4.0s, 9.3s (gc time, Tak, Q2)

So it looked like the ARM core would have nearly twice the performance of the RISC-V core. Except in garbage collection, though the slightly faster time there for the RISC-V core could be partially attributed to its slightly smaller workspace (anyone know why it is smaller?).

My measurements
However, this is what I found when I tried those benchmarks with the latest Arduino IDE. For ARM, I found that some benchmarks run faster with -O2 than -O3, so I’m reporting both results. For RISC-V, -O3 was fastest across the board.

  • ARM -O3: 5 ms, 2.6 s, 5.9 s (gc time, Tak, Q2)
  • ARM -O2: 5 ms, 2.2 s, 4.7 s (gc time, Tak, Q2)
  • RISC-V -O3: 4 ms, 1.9 s, 4.2 s (gc time, Tak, Q2)

So the RISC-V at -O3 beats ARM at both -O2 and -O3 in both the Takeuchi and the Q2 benchmark.

What about floating point?
The ARM cores on the RP2350 have a full hardware floating point unit, while the RISC-V cores do not. So maybe the ARM core will really shine once floating point operations are involved?

So I also ran the Fast Fourier Transform benchmark:

  • ARM -O3: 64 ms
  • ARM -O2: 70 ms
  • RISC-V -O3: 60 ms

Surprise! Despite the RISC-V having to resort to software floating point operations, its faster speed in handling overall uLisp routines more than makes up for it.

I also tried the following micro-benchmark in an attempt to maximize the “concentration” of floating point operations:
(time (dotimes (x 1000) (tanh 0.5)))

Finally, the ARM CPU with its FPU was able to pull ahead:

  • ARM -O3: 6 ms
  • RISC-V -O3: 17ms

Conclusions:
For integer code, the RISC-V cores in the RP2350 now seem to be the more performant choice for running uLisp than its ARM cores.

Even for many floating-point heavy workloads, the overhead of evaluating lisp expressions and operating on linked list structures appears to be high enough that the floating-point advantage of the ARM cores does not matter enough to give it the lead. Only in very specific floating-point-heavy benchmarks can the ARM cores pull ahead.

I suspect the difference between my numbers and the ones listed on the uLisp Performance page might be due to newer compiler versions. The RISC-V performance appears to be very sensitive to compiler optimizations, and it only really shines once you enable -O3. I also found this video with some (non-uLisp) benchmarks comparing the RP2350’s RISC-V and ARM cores when using different compilers, and the author found very very significant differences depending on which compiler version they were using.


#2

Very interesting. I assume you’ve used the latest release of ARM uLisp, 4.8f. It would also be useful to know which version of the Raspberry Pi Pico/RP2040/RP2350 core you’ve used. The latest one seems to be 5.4.2.

I hadn’t actually got around to updating the Raspberry Pi Pico timings on the Performance page for this version of uLisp (as indicated by the lack of ‘‡’), and also I always take the timings with the default optimization level; as you’ve written, this can make a large difference.