Proposal to store immediate values in the object cell


#21

I was initially planning to have everything just use char values, but it turned out that I had to change more places to ensure there wasn’t an overflow than just letting the full width bleed all the way through.

Perhaps. The problem is that UTF-8 was designed to make Unicode transparent to 8-bit clean transports like UART. Most Unix terminals have started using UTF-8 as the default encoding. Right now, uLisp looks like it handles Unicode strings:

3071( (le((((str "🙂")) (format t str))
🙂
nil

Aside from the extent to which it confuses the cursor position, I mean. However:

3071( (length "æði")
5

3071>((char "æ" 0)
#\U+c3

307(> (char "æ" 1)
#\U+bd

æ is U+E6, and is in ISO 8859-1. I’d have to explicitly configure the terminal to use that encoding, though. Instead, UTF-8 makes it look like it just works, unless I specifically go looking. The highest code point that gets encoded as a single byte by the terminal is #x7F. If it gets sent a byte higher than that, it’s probably going to replace it as an error. The code I added sends an #\U+ for everything that’s not ASCII, so that issue is avoided. (I might point out that £ is U+A3, if you want to see what happens.)

Text is not 8-bit nowadays.


#22

OK, I’m still a bit undecided about Unicode; is a partial solution going to be useful?

Apart from that, I think your improvements are a great addition to uLisp and I’d like to incorporate them in the next release. Is there anything else to be worked out, or is it ready to be merged in?


#23

Frankly, so am I. Putting bounds checks on code-char and char-code and restricting the #\U+ syntax to codepoints under 256 should be enough. It’s not a good idea to send bytes with values above 127 unfiltered, though. (And 127 kinda needs special treatment, too.)

I am not aware of anything - except the bounds checks - that should be changed from what’s in my branch on GitHub. I still need to get around to the symbols, though; that’ll touch a lot more places in the code than the characters did.


#24

I see you’ve provided a fixnump function. What might a user want this for (I’m thinking about documentation)?


#25

It’s mainly there to complete a set. Common Lisp doesn’t have one, but it does have typep. I’m not overly attached to it.


#26

Attached to typep or fixnump?


#27

fixnump. There’s no typep in uLisp, is there?