Representing a string as a vector of characters?


#1

Good to see strings getting some love!
I recall it being somewhat frustrating to treat them as sequences a while back.
I think quite a few Lisps treat them as a vector of chars so anything that works on a vector will work on a string.
Clojure does a better job at blurring the lines between list and vector such that any sequence function will work on either. It’s good to have some new flexibility with your additions though!

I recall a time with ye olde Lysps where you could only explode symbol names into chars, modify those, and implode them back into symbols. Now THAT was crufty. (But as a side effect, you had the entire language backing working with strings as they became a list).


uLisp 4.4 released today
#2

I thought about using a vector of characters as the string representation, but that would have been a much less efficient way of representing strings. For example, on a 32-bit platform:

(defvar a '(#\a #\b #\r #\a #\c #\a #\d #\a #\b #\r #\a))

takes 100 bytes, but:

(defvar b "abracadabra")

takes 28 bytes.

I too remember using an old Lisp that had EXPLODE to turn a symbol/string into a list of characters, and IMPLODE to convert them back to a symbol. They’re not standard Common Lisp so I haven’t implemented them in uLisp, but you could write them as:

(defun explode (str)
  (if (zerop (length str)) nil
    (cons (char str 0) (explode (subseq str 1)))))

(defun implode (lst)
  (if (null lst) ""
    (concatenate 'string (string (car lst)) (implode (cdr lst)))))

For example:

> (explode "kitten")
(#\k #\i #\t #\t #\e #\n)

> (implode '(#\k #\i #\t #\t #\e #\n))
"kitten"

#3

Okay that’s a very good approach! Essentially with those two we get all of the List editing functions applicable to strings. Thanks for the trick. For very long strings it might be good to have an optional buffer that you save the intermediate result into as a parameter so you can do TCE. Otherwise it seems that cons and concatenate have to expand everything before they can finally do their job.


#4

What’s TCE?


#5

Tail call elimination.
I think you have it in uLisp, essentially it makes the compiler turn a recursion into a loop so you can go on forever without blowing the stack. The prerequisite (usually) is that there is nothing more to be done in the recursion before the next round. So there are no computations being accumulated from recursion cycle to recursion cycle.

Below you can see that I’m consing up a temporary list in the buffer and passing that on so cons can do it’s job immediately. In Lisps that support tail call elimination this can handle huge lists without blowing the stack. Each iteration around leaves no additional computations to be done and so the compiler can optimize.

(defun safe-reverse (list &optional buffer)
  (if (null list)
      buffer
      (safe-reverse (cdr list)
                    (cons (car list) buffer))))

Now, if you just wanted the normal order back you’d (reverse buffer) at the very end. :)
And the whole thing becomes a generic looping construct. I didn’t have uLisp handy as I’m at work so I’m not sure if a different approach would have to be done… So, long story short, I’m happy to have the full power processing text as lists.


#6

Oh yes, I call it Tail-call optimization.