with-output-to-string breaks when using other string functions inside


#1

Here’s an example.
This works, and correctly evaluates to “success”:

(with-output-to-string (s) (princ "success" s))

However, this returns an empty string:

(with-output-to-string (s) (string #\1) (princ "success" s))

The issue seems to be that with-output-to-string uses the GlobalString values to implement the stringstream, and relies on them remaining intact throughout the evaluation of all of its form arguments. However, many other functions in uLisp also call startstring() and thereby overwrite GlobalString (including string, format nil and a few others).

This behavior is quite surprising, because it is not obvious from the outside which functions will interfere with the stringstream.

A few possible directions for fixing this:

  1. Check in startstring() whether GlobalString is already set (non-NULL). If already set, error out. Using functions like string inside of with-output-to-string would fail, but this is much better IMHO than generating undefined behavior. This would also have the side-benefit that nested uses of with-output-to-string would be caught. However, all functions that use startstring() will need to be modified to unset the value when they’re done.
  2. Turn GlobalString into a stack (i.e. list?) to support nested usage. Would also require each function that starts a new one to pop it off the stack again before return.
  3. As far as I can tell, with-output-to-string is the only function that allows evaluating other forms while using GlobalString. Hence, it might also be enough to maintain a second set of GlobalString variables just for stringstream. with-output-to-string would use this separate set of GlobalString variables, while other string-related functions would use the current existing one. The puts functions for printing into a string would need to be wrapped into two versions, one for each global string.

There are probably some additional complications for some of these that I missed or haven’t thought through.

If you have a preferred direction for fixing this, I can try to take a stab at making a patch…


#2

@danielmewes This has been on my mind as something that needs addressing, so thank you for raising it, and for your analysis.

In response to your alternative suggestions:

In the short term I think it would be good to do your option (1), ie give an error when a function like string is used inside with-output-to-string. As you say, the behaviour at present is a silent failure, which is not desirable.

As a longer term solution I don’t think (3) would be sufficient, because you can in theory nest two or more with-output-to-string clauses, outputting to different strings. So I think (2) will be needed to cover all possible use cases. My preference would be to use a Lisp list, rather than a C stack, so the memory management will be handled automatically by uLisp.

I’ll work on (1) within the next week, and give some thought about how best to approach (2). Feel free to suggest a patch!


#3

Hi David, thanks for considering this bug and for sharing your thoughts on the potential fixes.

I’ve been thinking about this a little more, and I’ve actually been liking option 3 a lot more. We can still combine it with a check that makes sure that only one with-output-to-string string stream can be opened at a time. I’ll take a stab at implementing this, but definitely feel free to discard my patch if you feel like this is not the right direction.

The benefits of option 3 over option 1 are that a) we don’t need to modify all the places that use startstring() to reset the GlobalString variable afterwards, and b) we can actually support using string operations that internally use GlobalString within a with-output-to-string block.

Option 2 on the other hand has additional complexity if we actually want to support nested with-output-to-string, since we’d also need to associate the stream object to the entry in the GlobalStringList (or whatever that variable would be called). I guess we can just store an index into that list into the stream, as long as it’s smaller than 256 (or is it 128?)… but it adds even more complexity I think?
Of course I’d be happy if we have option 2 at some point, but I suspect option 3 solves 80% of the use cases that are currently broken. (it at least selfishly solves the one I have right now :D )

As far as I can tell from the code, only pstr is currently supported for string streams, while reading (gstr) is only used internally within other functions? This makes using a separate variable for with-output-to-string especially easy, since we’ll only need the equivalent of GlobalStringTail I believe.

I’m also realizing that there’s a much more insidious issue with some of the stream types, in that you can leak them out of the with-... blocks and writing to them afterwards might corrupt memory (or maybe even the filesystem in the case of with-sd-card? Not sure).

Something like this:

(defvar x)
(with-output-to-string (s) (setf x s))
; Bad! GlobalString is no longer on the GCStack and might have been garbage-collected.
(write-string "test" x)

I’m not sure how to best detect/stop this in general. Though a simple way to catch at least the most obvious cases like that could be to make sure that when we leave the with-… function and unprotect the GlobalString / close SDpfile / SDgfile etc., we set those global variables back to some invalid value (e.g. NULL for GlobalStringTail). Then the put/get functions for the stream type can check for this and fail.
This still allows writing to the wrong file/string if one gets creative in using multiple with-... blocks one after another, but I think that’s an even less likely edge case.
Any thoughts on this?


#4

I wasn’t totally sure which Github repo is the most “canonical” one for making patches against, so for now I’m just linking a patch file here.

Here’s a patch that implements option 3, with additional detection for streams that have gone out of scope.

The out-of-scope detection is implemented only for stringstreams right now, but a similar approach could be implemented for other stateful stream types I think (at least for with-sd-card). Some care needs to be taken to properly close and reset the streams in errorend(), in case an error is raised while a stream is open, as you can also see in this patch.

This particular diff is against the WASM port of uLisp, but other than the specific line numbers and file names hopefully should apply to the original ulisp.c variants.

I wasn’t sure what you prefer for the function naming. I left the existing pstr and gstr unchanged, and instead added a new pstrstream for writing to a stringstream as it comes out of with-output-to-string.

Here are some test cases to try out:

  1. Using string functions within with-output-to-string
(with-output-to-string (s) (string #\1) (princ "success" s))

Now returns

"success"
  1. Use after close
(defvar x)
(with-output-to-string (s) (setf x s))
(write-string "test" x)

now errors out with:

Error: 'write-string' stream has been closed
  1. Multiple open stringstreams
(with-output-to-string (s) (with-output-to-string (s2)))

Now errors out with

Error: 'with-output-to-string' stream already open