Proposal to store immediate values in the object cell

odin · 2020-04-16 16:17:45 UTC

I was initially planning to have everything just use char values, but it turned out that I had to change more places to ensure there wasn’t an overflow than just letting the full width bleed all the way through.

Perhaps. The problem is that UTF-8 was designed to make Unicode transparent to 8-bit clean transports like UART. Most Unix terminals have started using UTF-8 as the default encoding. Right now, uLisp looks like it handles Unicode strings:

3071( (le((((str "🙂")) (format t str))
🙂
nil

Aside from the extent to which it confuses the cursor position, I mean. However:

3071( (length "æði")
5

3071>((char "æ" 0)
#\U+c3

307(> (char "æ" 1)
#\U+bd

æ is U+E6, and is in ISO 8859-1. I’d have to explicitly configure the terminal to use that encoding, though. Instead, UTF-8 makes it look like it just works, unless I specifically go looking. The highest code point that gets encoded as a single byte by the terminal is #x7F. If it gets sent a byte higher than that, it’s probably going to replace it as an error. The code I added sends an #\U+ for everything that’s not ASCII, so that issue is avoided. (I might point out that £ is U+A3, if you want to see what happens.)

Text is not 8-bit nowadays.

johnsondavies · 2020-04-16 17:49:25 UTC

OK, I’m still a bit undecided about Unicode; is a partial solution going to be useful?

Apart from that, I think your improvements are a great addition to uLisp and I’d like to incorporate them in the next release. Is there anything else to be worked out, or is it ready to be merged in?

odin · 2020-04-16 19:27:20 UTC

Frankly, so am I. Putting bounds checks on code-char and char-code and restricting the #\U+ syntax to codepoints under 256 should be enough. It’s not a good idea to send bytes with values above 127 unfiltered, though. (And 127 kinda needs special treatment, too.)

I am not aware of anything - except the bounds checks - that should be changed from what’s in my branch on GitHub. I still need to get around to the symbols, though; that’ll touch a lot more places in the code than the characters did.

johnsondavies · 2020-04-17 07:51:57 UTC

I see you’ve provided a fixnump function. What might a user want this for (I’m thinking about documentation)?

odin · 2020-04-17 09:36:32 UTC

It’s mainly there to complete a set. Common Lisp doesn’t have one, but it does have typep. I’m not overly attached to it.

johnsondavies · 2020-04-17 09:57:16 UTC

Attached to typep or fixnump?

odin · 2020-04-17 10:31:08 UTC

fixnump. There’s no typep in uLisp, is there?