I was initially planning to have everything just use
char values, but it turned out that I had to change more places to ensure there wasn’t an overflow than just letting the full width bleed all the way through.
Perhaps. The problem is that UTF-8 was designed to make Unicode transparent to 8-bit clean transports like UART. Most Unix terminals have started using UTF-8 as the default encoding. Right now, uLisp looks like it handles Unicode strings:
3071( (le((((str "🙂")) (format t str)) 🙂 nil
Aside from the extent to which it confuses the cursor position, I mean. However:
3071( (length "æði") 5 3071>((char "æ" 0) #\U+c3 307(> (char "æ" 1) #\U+bd
æ is U+E6, and is in ISO 8859-1. I’d have to explicitly configure the terminal to use that encoding, though. Instead, UTF-8 makes it look like it just works, unless I specifically go looking. The highest code point that gets encoded as a single byte by the terminal is #x7F. If it gets sent a byte higher than that, it’s probably going to replace it as an error. The code I added sends an
#\U+ for everything that’s not ASCII, so that issue is avoided. (I might point out that £ is U+A3, if you want to see what happens.)
Text is not 8-bit nowadays.