Better output from Builder


#1

It would certainly make the GitHub diffs a lot easier to understand if the lists of symbol strings and doc strings had nice names.

Is there a reason why the builder is still outputting not-very-useful names like string122 ?

It would be a lot easier to understand and modify if the variables were instead called stuff like string_withsdcard or something like that.

The transformation from Lisp name to C++ name doesn’t have to be too complicated; the simplest is probably to replace * with _ and remove the -. So for example for *features* the variable would be called string__features_ which is a lot clearer than string19.

What do you think?


#2

Yes, I could easily do that - I didn’t realise that it would make a difference to anyone. I will try and make the change before I generate the next set of source files.


#3

While you’re at it, I just thought of another modification code-wise that might be helpful to people who want to mod uLisp (which is like, half of its users from looking at this forum):

Instead of doing four separate sections in the source for the functions, the symbol strings, and the docstrings like this:

// ----- functions section ---- //
object* fn_foo(object* args, object* env) {
    // ...
}

object* fn_bar(object* args, object* env) {
    // ...
}

// ----- strings section ---- //
const char string_foo[] = "foo";
const char string_bar[] = "bar";

// ----- docstrings section ---- //
const char doc_foo[] = "Doc for foo";
const char doc_bar[] = "Doc for bar";

// ----- table section ---- //
const tbl_entry_t lookup_table[] = {
    { string_foo, fn_foo, 0123, doc_bar },
    { string_bar, fn_bar, 0156, doc_bar },
};

you could put the tbl_entry_t entries and strings and stuff interleaved next to the functions, which would make the number of sections that someone would have to scroll back and forth to only two, down from the current four:

// ----- functions section ---- //
object* fn_foo(object* args, object* env) {
    // ...
}
const char string_foo[] = "foo";
const char doc_foo[] = "Doc for foo";
tbl_entry_t tb_foo = { string_foo, fn_foo, 0123, doc_foo };

object* fn_bar(object* args, object* env) {
    // ...
}
const char string_bar[] = "bar";
const char doc_bar[] = "Doc for bar";
tbl_entry_t tb_bar = { string_bar, fn_bar, 0156, doc_bar };

// ----- table section ---- //
const tbl_entry_t lookup_table[] = {
    tb_foo,
    tb_bar,
};

A quick test showed that doing this doesn’t add any more bytes to the executable size.

I would have taken it one step further and inlined everything including the function itself by using a C++ lambda function, but strangely this added about 200 bytes to the executable size. Not sure why.


#4

I think there are pros and cons to doing this. One disadvantage is that it will make the source longer, and is it really so difficult to refer to three different sections?

Would anyone else find this useful?


#5

Longer does not necessarily mean less readable. As is, each function is made up of those four parts (ignoring the BUILTINS enum, but that’s a bit harder to get rid of), so why not put them next to each other so that as many parts as possible would fit on the same screen together? My main concern was scrolling, since even the shortest section (the strings) is still several screenfuls tall because of the sheer number of functions, and scrolling takes time proportional to the distance scrolled. Ctrl+ F is faster but still takes time, and is a little more awkward if you’re using the 1.x branch of the Arduino IDE.


#6

Yes, I generally agree with that. Let’s see if anyone else would find this useful …


#7

I will propose a slight alternative/compromise where we group the functions, strings, and docs together, but put them in the table as usual. So it would look like:

// ----- functions section ---- //
object* fn_foo(object* args, object* env) {
    // ...
}
const char string_foo[] = "foo";
const char doc_foo[] = "Doc for foo";

object* fn_bar(object* args, object* env) {
    // ...
}
const char string_bar[] = "bar";
const char doc_bar[] = "Doc for bar";

// ----- table section ---- //
const tbl_entry_t lookup_table[] = {
    { string_foo, fn_foo, 0123, doc_foo },
    { string_bar, fn_bar, 0156, doc_bar },
};

It’s more jumping around than making standalone table entries but doesn’t add length to the code.

I do think having the functions, names, and docs grouped together would make it easier to work with since I rarely care about reading the docs or names sequentially but always want to see the name and docs of the function I’m using, and its annoying sometimes because finding their related parts involves searching for several different terms eg. copy-list -> string94 -> fn_copylist. Searching copy-list wouldn’t work to find fn_copylist (though the initial proposal would help alleviate that by naming the strings better?)


#8

That would work too, the only reason I suggested giving the table entries variable names is that doing that makes getting rid of the builtins enum a bit easier (although I haven’t worked out the details yet).


#9

Actually (sorry for the double post) I looked at this again and there is an advantage to giving the table entries variable names: the minmax entry is also next to the function. That is kind of important when you’re doing stuff with the function.


#10

Yea I considered that aspect but believe the longer code wasn’t worth it for putting the minmax entry next to the function. I find the documentation string works well enough to show what the function does so you don’t need to look at the minmax, and it’s rarely edited in my experience.


#11

@hasn0life do you agree with the first suggestion, to give the strings more useful names; for example:

const char string_defun[] = "defun";

instead of the current:

const char string19[] = "defun";

#12

Yes I think it would be helpful when searching for what functions are called. If you want you can also probably generate a comment which shows their index if that’s important, perhaps something like:
/*19*/ const char string_defun[] = "defun"


#13

New release of uLisp and the names are still in the string122 format… and the diff of the string, doc, and table section is massive even though barely anything changed there (only 5 new functions were added out of like 200).


#15

I decided against doing it as I could see some downsides to what you suggested, but I can provide you with a special version of the source which will keep the numbering of the previous version, to simplify updating your uLisp fork.


#16

Can you describe the downsides to replacing the numbers with function names? Cause I do think the names are better and I don’t believe you mentioned any downsides in this thread.


#17

My main objections are that it will make the source more verbose and, for me, harder to follow. Also it is a change that requires work to implement and test, and I don’t believe it will give more than a minor benefit to perhaps a couple of users who maintain a fork of uLisp. But correct me if I’m wrong.


#18

Any reason why we don’t include the function names and docs inline in the lookup table directly? The C compiler will fill in the struct appropriately and then we get rid of a lot of boilerplate and jumping around.

const tbl_entry_t lookup_table[] = {
  { "defun", sp_defun, 0327, "(defun name (parameters) form*)\n"
                             "Defines a function." },
  { "defvar", sp_defvar, 0313, "(defvar variable form)\n"
                               "Defines a global variable." },
  { "eq", fn_eq, 0222, "(eq item item)\n"
                       "Tests whether the two arguments are the same symbol, same character, equal numbers,\n"},
...
};

If the doc strings are too long and cumbersome you can also replace them with a macro

#define EQ_DOC "(eq item item)\nTests whether the two arguments are the same symbol"
...
{ "eq", fn_eq, 0222, EQ_DOC }

This would make things more tidy, and keep everything in one place, without the need to come up with variable names for anything but the C functions.

On that topic, it would probably be better to create a descriptive C macros for the function type and min/max argument counts so that that part was easier to create and less error prone.


#19

I personally just don’t understand the benefit of numbering the strings. To me it makes it more verbose in that you have to look up and remember the number of the string in addition to the function name. And I will argue that being easy to fork is a great feature of the language.

Also I really like @nanomonkey’s proposals, and would prefer them to mine.


#20

Any reason why we don’t include the function names and docs inline in the lookup table directly?

That would definitely be the ideal solution.

The original version of uLisp was for AVR processors. In that version, to save space the strings are put in program memory by defining them as PROGMEM, and have to be accessed by calling pgm_read_byte(). I don’t believe that it would be possible to implement that using your approach. However, I could use your approach for the other platforms.

If the doc strings are too long and cumbersome you can also replace them with a macro

I think they would indeed be too long to fit tidily into the table, but I would still prefer to keep them as strings with meaningful names, rather than macros, and I don’t see any advantage of macros.

I will argue that being easy to fork is a great feature of the language.

I appreciate that, and will experiment with these suggestions.


#21

I’ve built some test versions incorporating these suggestions; let me know what you think.

They are at:

https://github.com/technoblogy/ulisp-esp - Test versions

https://github.com/technoblogy/ulisp-arm - Test versions