Book about writing a Lisp interpreter

Aeneas · 2020-07-01 17:52:15 UTC

Oh, and you might consider “writing a book” (more specifically, AGAIN, as I actually have one by you somewhere around, “Practical Programs for the BBC Computer and Acorn Atom”).

The idea of the book would be: “writing a primitive Lisp interpreter”, where you guide your reader through the process. Perhaps even “stop early”, you might even just show how to do recursion and omit looping entirely. There are such books, but … well, frankly, I see you as more brilliant than the others, and I would care more about your book than about theirs.

johnsondavies · 2020-07-02 14:16:08 UTC

Even if I could write such a book, I’m not sure that many people are interested enough in writing their own Lisp interpreter to read it!

However, I have been thinking about adding some more sections to the Implementation section of the uLisp site, which could describe some of the most interesting parts of uLisp. Are there any specific areas you’re interested in?

Aeneas · 2020-07-03 09:08:33 UTC

Dear David,

I can only speak “what I would do if it were me”, but I would do it the following way - with the strong caveat that I have NEVER written a lisp interpreter:

My basic premise is: HOW EXACTLY the incredible flexibility of Lisp expressivity is translated into mundane boxed numbers in memory of a machine is a real mystery. Lisp is a language of the mind whereas C is a language of the machine. This conceptual difference is what is to be pointed out, not just “what is a pointer to what”. The mere fact that you operate with pointers is due to the conceptual idea that in general, in Lisp, you DO NOT CARE where something is in memory, so you will see NO “arrays in memory”. (Do not mention yet eq.) You want to express a process in the mind, in a sort of conceptual algorithm, NOT a “machine algorithm” - this is the real strength of Lisp. - Direct yourself thus to those “generally acquainted with Lisp” and “wanting Lisp”. Utter novices don’t build interpreters. But assume that the target readers know how to USE Lisp at least in a passing way, not how to MAKE Lisp. And assume they are more interested in Lisp than in C (a common “mistake” is to think the audience “loves C”… a general acquaintance with C can be assumed, but more like introductory school-kid-level … as this is the level of most adults, too). Hence, I would NOT explain them “sections in a long program”, but rather do a series of little self-contained programs which each show how “it grows”. Because that is the way, I assume, YOU did it, too, no reasonable person ever sits down and says, “I will now sit down and write 5000 LoC, boy, I hope it will work in the end. But I will explain it to myself surely very nicely while I do it.” Nobody ever cares all that much about a Unix manpage for a config file, everybody wants sample config files for different use-cases. So each of the following is assumed to be in a little self-contained program, perhaps building on the one preceding, perhaps not. Whereever things cannot be expressed “cleanly” in Lisp concepts, some “trickery” is permitted, as SIMPLICITY of the explanation is MORE important than conceptual cleanliness and perfection.
Explain what the “basic unit” is, the “cell”: that thing which is either a symbol, or a number, or character, or a string… In C, these things are all different, and they are expressed differently. Explain what construct you use that can figure out its own nature as the one or the other “type” of thing. - Show different assignments to the cell. - Show how NIL is represented.
Do the “Hello World” with the cell: How it decides what it is and how it prints itself to the console.
Let cells interact with each other or not. Particularly, show how immediate numbers can be operated upon. Further, show how a symbol can point to a number, and how two symbols can thus be “added”. Show how equality of two “things” is determined. Show how numbers are always eq and symobols are always eq. (Do not yet foreshadow that this simple eq will get more complex in the future… people get curious, and the “many ways of equal” in Lisp are NOT a good thing to start with. Keep it simple.)
Show how a combination of these “things” is made. Show how things can be “consed”. Print a cons.
Show how a cons can be “taken apart again” - introduce CAR and CDR. Show how conses can become quite involved, in a sort of “cons tree”. Demonstrate CAAR, CDDR, CADR and CDAR.
Show how a thing can be consed with NIL. Show that conses with NIL operate differently than “normal” conses in so far as you can take CAR and CDR of NIL and get NIL, whereas if you cons NIL with NIL, you do not get NIL. Show that a cons of an element with NIL is printed differently - a list.
Show that more than one thing can be consed - the nature of lists. Point out how DIFFERENT things can be put in ONE list. Print a list. (Yes, I am totally annyoing with this printing stuff, yet it is a very obvious way to do debugging. READING is not important, all can be pre-set for a “batch operation” - PRINTING is.) Point out that there will be a lot of operations which will work on a list, but not on a cons. (You append and reverse lists, not conses, and for a novice, this will not be obvious.)
Show how different and involved structures can exist, and that conses and lists can be mixed, constructed and deconstructed.
Show how ONE operation can be applied to ALL elements on a list. Show how an entire list could be transformed, e.g. a number added to each number, a symbol consed with each other symbol or thelike, giving a list again. Show how an entire list could be used to compute a single result, i.e. the list-to-one-result type of operation.
Introduce a few operators that “can do elementary things with lists”, perhaps append and reverse them, and extend a bit the repertoire above. Somewhere here it would be nice to explain “what happens with the parts that were already computed” and “where the parts are that are still awaiting computation”. (In a for-loop, you know that rather exactly: some of the things have passed through your index, some have not. And you know that novices have a real issue with a for-loop and with what has passed through its index already and what has not, i.e. when the termination condition triggers.)
Show that these operations can be “attached” to the first element of the list. That a list looks sort of like a “tadpole” with a rather important head. Mention that it is this property which makes a list more interesting than the usual cons, and how interesting is that its LAST element is in a way its “most important” element.
Show how to decide whether to “activate” a list and compute something due to its “head” or whether to just treat it as a series of cells - introduce quote. Print a “computed” list and print a “quoted” list (even if this is the same operation and only the results look differently; it is also important to see that the “print” itself is not “magical”, but that everything happens BEFORE it).
Show truth and falsity. How SOMEHOW, quite ARBITRARILY, it is decided that NIL is false. (In Scheme, after all, it is not.) Point out how this is convenient, but admit that it is not “the only way to see things”. Based on the above, show how a condition could be made to work: if something is true, print something else, if it is not, do not print that other thing.
Show how COND works. First, a basic decision of two things. Then, more things. Show the problem of not having a default result. Show what happens with a default result. This can, perhaps, be broken apart into smaller steps, I am not sure exactly how.
Show how COND can be made part of a computation with lists: when a thing happens, compute with the one list, when another thing happens, compute with another list, when a third thing happens, compute with that second list something differently (i.e. with another head).
Combine two CONDs - one leading to the other, i.e. COND as a sub-condition.
“Name” things - introduce SETQ. Show how eq can operate on two “things” that have been “named”. Show how operations can be done on things that have been “named”, e.g. consing, subtraction and addition and thelike. Perhaps explain the difference between DEFVAR and SETQ, but really, do not get carried away with this, as many early and important Lisp-programs did never bother with such subtleties. Show how some things are “named by default”, such as numbers and NIL. Show how text strings are treated as symbols, both in naming and in reference: how a string can be a symbol for a symbol, how it can be a symbol for a number, how it can be a symbol for itself, and how it is translated into a pointer for internal computation.
“Name” functions. Show how a “named” function, e.g. addition of 1, can operate on a number directl and on a symbol pointing to a number. Show how a named function can operate on several arguments, not just one. Show how it can operate on a list, a list of things “directly” and a list of symbols pointing to things. Show how this is obvious for some functions, like addition and multiplication, yet not for others, like subtraction and division, and talk about evaluation order and precedence. Show symbol functions, like consing each element of a list with another element, or attaching all symbols of the list together and to nil - LIST. - Basically, show ALL interesting ways a function can just unconditionally operate on a list of
Introduce simple functions that contain a COND. No lists yet. First, show, if the argument is 10, print ‘X’, otherwise print BANANA. Show if the argument is BANANA, print 10, else print ‘X’ - i.e. that these functions are NOT necessarily doing things with the same type of arguments and results. Again, show it directly and indirectly, that is e.g. evaluate 10 directly or the symbol MYNUMBER which points to 10.
Introduce functions which contain COND and operate somehow with lists, again, directly and indirectly in all four ways: lists with immediate values submitted immediately, e.g. (1 2 3 4); lists with indirect values submitted directly, e.g. (A B C D); lists with direct values submitted indirectly, e.g. X, where X is (1 2 3 4); and lists with indirect values submitted indirectly, e.g. Y, where Y is (A B C D).
Introduce simple functions which do recursion, e.g. showing how a multiplication of a range is computed. Show how, when the range begin with 1 and ends with N, this is a factorial. Recursion is a big thing and everybody hates it at first, and once it “clicks”, it is even loved (because with a recursive function you actually need to do LESS and not MORE while you use it: you no longer pay attention how the conditions exactly interlock with loops, you do not need to “break out” of loops, and your looping conditions do not need to become an interwined mess - instead, you can “jump out” of the recursion at any moment you weeks later decide to add). Show how a sum is computed.
Show how recursive functions can operate with lists - get lists and give a list. Show how functions can turn a list into a simple value, a simple value into a list (I actually like to “kick the tires” of Lisp systems by testing them with (defun generate-symbol (n) (cond ((zerop n) '()) (cons 'X (generate-symbol (- n 1))))) and calling this with (length (generate-symbol)), which gives you a general feeling of memory capabilities). Point out that a function can give only ONE result, and that if TWO or more results are desired, a function can LIST those results. Show, therefore, how a function can operate either with a multitude of arguments, A, B, C and D, or with ONE argument - the list (A B C D).
Explain the loop construct as an alternative to recursion. Compare how certain loops and recursions lead to the same results, even though they look differently and are implemented differently.
Talk about “different kinds of equality” (goodness, how funny that sounds! :D). Talk about how you can reliably check whether two symbols EQ, but NOT whether two LISTS EQ, and concoct a way to test for equality of lists. (It is infact quasi one of my first definitions in every uLisp program - I just much prefer EQUAL.)
Talk about ephemeral values, e.g. from intermediate results - how they can fill up your memory. Mention how some device must be employed to “get rid of them”. First, talk about what happens when A was assigned the value 1, and then A was made to point to BANANA, and then was given the value ‘#’. What happens with 1 and BANANA, “Where are they?” - Introduce garbage collection. Whatever you do, keep it SIMPLE: the EFFECTIVENESS of the garbage collection is SECONDARY to the simplicity in understanding. (This is just like with sorting… there are all these wonderful algorithms which nobody ever cares about, and most people either “copy it from somewhere” or do the naivest thing and hope their bosses will never find out because it will compile somewhere deeply and it will be hard to find. One should be realistic. Perhaps point to resources to the more complex stuff, but do not actually use it.)
The internals are largely done. Here, it is time to put in a reader: how to “get” what the user does, evaluate and print it - the REPL. I/O is normally quite a large part of any interpreter, and the most UNINTERESTING from the reader’s point of view, as it handles mundane tasks like “when is a symbol finished” and thelike.

28: Combine reading, evaluation and printing - and let the reader have a REPL.

29: Show how to “save the environment” ( :D ). And how to load it again. Maybe in a text-readable file. It does not need to be “Lisp-readable”, i.e. “pasteable in a REPL”, but just so “not everything is lost when you quit the machine”. Perhaps show how to print all present definitions and all present variable values, for “pasting on later” or for later critique. Well, so far, you are pretty much DONE, aren’t you? :D - That alone is quite ENOUGH for a nice little list interpreter. - So the rest is really “optional”, and could treat first, file I/O, strings, and then, arrays.

Anyway, if I had your absolutely amazing skills, that is how I would approach it! :) It is just an idea, of course, and with ideas with someone ELSE’s time one has to be very cautious, so I would like to express my full respect for you and what you do, and that I very much assume you will reject this. “What I had in mind” while writing it was the usual informatics course at school, where the teachers really are unsure what to teach you, and this would be like a nice semester-course of “actually DOING something”. (I never had it at school, I know it from my friends. They used Delphi to compute confidence intervals and express it colorful columns with a modern GUI. In parallel, privately, I made a game in QBASIC where tanks were shooting each other up in a randomly generated desert landscape… they were more motivated by THIS than by the statistics. They wanted the computer to “live”, not just to “display”. And I think Lisp is form of life for IDEAS.)

P.S.: I totally forgot “LET”. It is not strictly necessary, either, but I would put it around “saving the environment”, before or after, together with the other extensions. Otherwise, one can actually use an “inner SETQ” or just combine functions.

odin · 2021-01-27 13:00:08 UTC

I know I’m reviving a rather old thread here, but so be it…

Please do! I found those sections very informative. The largest ‘missing piece’ is the details of the main lookup table and its three components. I also think there’s no section on arrays right now, which would be interesting to have.

The targeting is very different, but isn’t this kind of what SICP is all about?

johnsondavies · 2021-01-27 13:16:46 UTC

Thanks for the two suggestions! I’ll make a start on describing the main lookup table, and how arrays are implemented.

johnsondavies · 2021-01-27 18:35:01 UTC

Here’s a first draft of the lookup-table and arrays sections. I welcome any comments, or suggestions of information I could usefully add:

The symbol table

Arrays

odin · 2021-01-28 11:24:15 UTC

Those look good. I have to admit I didn’t expect to see them that quickly.

johnsondavies · 2021-01-28 13:00:38 UTC

I’d already written bits of them…

I’ve added them to the Implementation section in the contents, and renamed The symbol table to Built-in symbols as it fits in better with the other sections.