Compiler mystery – any suggestions?

johnsondavies · 2021-07-16 17:43:51 UTC

While developing the latest release of the AVR version of uLisp, version 4.0/4.0a, I encountered some strange behaviour that I wasn’t able to explain.

The result is that I’ve included a statement in the release version of AVR uLisp that fixes the problem, but I’ve no idea why it works. Perhaps someone can help.

The behaviour

While testing the 4.0 beta version of uLisp on an Arduino Uno I discovered that it ran out of stack more readily than earlier versions of uLisp. For example, running the tak benchmark Version 4.0 beta ran out of stack with (tak 18 12 6), but the previous version, uLisp 3.6, could calculate this; for more details see below.

However, adding the following line into the source fixed this, making its stack performance as good as earlier versions:

int dummy; Serial.print((int)&dummy);

It doesn’t seem to be very critical where this line of code is added. To avoid it printing out information in the release versions of AVR uLisp, 4.0 and 4.0a, I have put it into the pseudoRandom() routine:

int dummy; if (RandomSeed == 0) Serial.print((int)&dummy); // Fixes the problem

It never gets executed because RandomSeed can never be zero, but it doesn’t get optimised out by the compiler because the compiler can’t spot this. For this reason simply doing this doesn’t work:

int dummy; if (false) Serial.print((int)&dummy); // Doesn't work

Testing the depth of recursion

To test the effect of the fix I tested the point at which the fibonacci fib function runs out of stack:

(defun fib (n)
  (if (< n 3) 1
    (+ (fib (- n 1)) (fib (- n 2)))))

(dotimes (x 24) (print x) (princ " ") (princ (fib x)))

I also tested the tak benchmark:

(defun tak (x y z)
  (if (not (< y x))
      z
    (tak
     (tak (1- x) y z)
     (tak (1- y) z x)
     (tak (1- z) x y))))

Here are the results:

AVR Version 4.0a

(fib 21) runs OK, (fib 22) crashes.

(tak 18 12 6) runs OK.

AVR Version 4.0a (with the fix commented out)

(fib 16) runs OK, (fib 17) crashes.

(tak 18 12 6) crashes.

AVR Version 3.6

(fib 18) runs OK, (fib 19) crashes.

(tak 18 12 6) runs OK.

The history

While running the benchmarks for the latest releases of uLisp I discovered that the AVR version of uLisp couldn’t run the tak function, with the parameters (tak 18 12 6), without crashing, presumably due to stack overflow. I verified that the previous version, 3.6, ran (tak 18 12 6) quite happily.

My initial theory was that the newer version was somehow using more stack space for each recursive call of the function, thus using up the stack more quickly. I therefore added the following debugging routine to print out the address of the local variable dummy, which would be created on the stack, and therefore this would give a guide to the address of the top of the stack at each recursion:

void print_stack_pointer () {
  int dummy; Serial.println((int)&dummy);
}

I added a call to print_stack_pointer() in the closure() routine, which gets called for each uLisp function call, to print the stack pointer at every invocation of tak .

I found that, as I suspected, version 4.0 Beta of uLisp did use slightly more stack space on each call of tak. However, I found that adding this debugging statement caused (tak 18 12 6) to run to completion without running out of stack. Furthermore, the fix worked even if the call to print_stack_pointer() was in part of the closure() routine that never actually got executed in this case.

Further experiments showed that simply putting the magic line:

int dummy; Serial.println((int)&dummy);

within some routine, even if it was never executed, causes the same beneficial effect.

I know it’s not unusual for a C program to work when you add debugging statements. Usually it’s because the compiler has optimised out part of your program that doesn’t need to be calculated within an inner loop, but adding the debugging statement makes the optimisation not possible. However in this case I’m at a bit of a loss to explain what’s going on.

I’ve compared the disassembly generated by the versions of uLisp with and without the magic line, and I can’t see any obvious reason for the difference.

My best explanation is that without the magic line the compiler optimises the code in a way that runs faster, but uses more stack space for saving the context in recursive calls, and adding the magic line prevents it from being able to do this.

System details

I was compiling uLisp using Arduino IDE 1.8.13 on a MacBook Pro running High Sierra. The Arduino IDE is using the built-in Arduino AVR Boards core, version 1.8.3.

The compiler version is avr-gcc/7.3.0-atmel3.6.1-arduino7 .

fe2o3 · 2021-07-17 03:37:25 UTC

Have you tried Arduino IDE 1.8.15?

johnsondavies · 2021-07-17 07:12:45 UTC

Have you tried Arduino IDE 1.8.15?

Just tried it - exactly the same results.

Kaef · 2021-07-23 08:34:26 UTC

Which compiler options are set by default? The behavior may change depending on the optimization strategy set.

There are some compiler options depending the stack usage (like -fstack-usage, -fno-defer-pop, -fcombine-stack-adjustments, -fconserve-stack) which may help to reduce the stack usage, but I have no experience using them.

johnsondavies · 2021-07-23 09:08:09 UTC

That’s a good suggestion. Including the “magic” line may have the side effect of prompting the compiler to choose a different compiler option which conserves stack usage. I’ll investigate…