[PATCH] open_memstream() and open_wmemstream()

Fri Feb 15 15:10:15 UTC 2013

On Wed, Feb 13, 2013 at 11:44:19AM -0500, John Baldwin wrote:
> On Thursday, February 07, 2013 4:12:22 pm Jilles Tjoelker wrote:
> > On Tue, Feb 05, 2013 at 03:46:43PM -0500, John Baldwin wrote:
> > > I've written an implementation of open_memstream() and
> > > open_wmemstream() along with a set of regression tests.  I'm pretty
> > > sure open_memstream() is correct, and I believe open_wmemstream() is
> > > correct for expected usage.  The latter might even do the right thing
> > > if you split a multi-byte character across multiple writes.  One
> > > question I have is if my choice to discard any pending multi-byte
> > > state in the stream anytime a seek changes the effective position in
> > > the output stream.  I think this is correct as stdio will flush any
> > > pending data before doing a seek, so if there is a partially parsed
> > > character we aren't going to get the rest of it.

> > I don't think partially parsed characters can happen with a correct
> > application. As per C99, an application must not call byte output
> > functions on a wide-oriented stream, and vice versa.

> Oh, interesting.  Should I remove those tests for using byte output
> functions from the open_wmemstream test?

Yes. Unless real applications are identified that use this undefined
behaviour, such a test would test the implementation and not the
specification.

I briefly tried to run the tests against the glibc implementation
(Ubuntu 10.04) and got many failures. Glibc does not check the
parameters for null pointers and simply stores them blindly which is
POSIX-compliant. Things like bad length 6 for "foo" appear to be glibc
bugs, and test-open_wmemstream even segfaults.

> > Discarding the shift state on fseek()/fseeko() is permitted (but should
> > be documented as this is implementation-defined behaviour).
> > State-dependent encodings (where this is relevant) are rarely used
> > nowadays.

> > The conversion to bytes and back probably makes open_wmemstream() quite
> > slow but I don't think that is very important.

> Note that we do this already for swprintf().  I've added this note:

> IMPLEMENTATION NOTES
>      Internally all I/O streams are effectively byte-oriented, so using wide-
>      oriented operations to write to a stream opened via open_wmemstream()
>      results in wide characters being expanded to a stream of multibyte char-
>      acters in stdio's internal buffers.  These multibyte characters are then
>      converted back to wide characters when written into the stream.  As a
>      result, the wide-oriented streams maintain an internal multibyte charac-
>      ter conversion state that is cleared on any seek opertion that changes
>      the current position.  This should have no effect unless byte-oriented
>      output operations are used on a wide-oriented stream.

This appears to "approve" the use of byte-oriented operations on a
wide-oriented stream which is in fact undefined.

> > > http://www.FreeBSD.org/~jhb/patches/open_memstream.patch

> > The seek functions should check for overflow in the addition (for
> > SEEK_CUR and SEEK_END) and the conversion to size_t.

> Note that SEEK_CUR is effectively only called with an offset of 0
> via __ftello().  I'm not sure of the best way to check for overflow, do I have 
> to do something like this:

> static int
> will_overflow(size_t off, fpos_t pos)
> {
> 	if (pos < 0)
> 		return (-pos > off);
> 	return (SIZE_MAX - off > pos);
> }

> For SEEK_CUR I would test 'will_overflow(ms->offset, pos)' and
> for SEEK_END 'will_overflow(ms->len, pos)'

> Presumably SEEK_SET should test for the requested position being
> beyond SIZE_MAX as well?

> If my above test is ok then I end up with this:

Hmm, perhaps it is better to restrict to SSIZE_MAX instead of SIZE_MAX
as this greatly simplifies the code. With the above code, comparisons
may be signed or unsigned and there may be no greater type that fits
both a size_t and an fpos_t. Negating a negative fpos_t is also
incorrect because it may overflow. If you restrict the memory buffer to
ssize_t, an ssize_t can be safely converted to an fpos_t.

> static int
> will_overflow(size_t offset, fpos_t pos)
> {
> 	if (pos < 0) {
> 		if (-pos > off) {
> 			errno = EINVAL;
> 			return (1);
> 		}
> 	} else {
> 		if (SIZE_MAX - off > pos) {
> 			errno = EOVERFLOW;
> 			return (1);
> 		}
> 	}
> 	return (0);
> }
> 
> static fpos_t
> memstream_seek(void *cookie, fpos_t pos, int whence)
> {
> 	struct memstream *ms;
> #ifdef DEBUG
> 	size_t old;
> #endif
> 
> 	ms = cookie;
> #ifdef DEBUG
> 	old = ms->offset;
> #endif
> 	switch (whence) {
> 	case SEEK_SET:
> 		if (pos > SIZE_MAX) {
> 			errno = EOVERFLOW;
> 			return (-1);
> 		}
> 		ms->offset = pos;
> 		break;
> 	case SEEK_CUR:
> 		if (will_overflow(ms->offset, pos))
> 			return (-1);
> 		ms->offset += pos;
> 		break;
> 	case SEEK_END:
> 		if (will_overflow(ms->len, pos))
> 			return (-1);
> 		ms->offset = ms->len + pos;
> 		break;
> 	}
> 	memstream_update(ms);
> #ifdef DEBUG
> 	fprintf(stderr, "MS: seek(%p, %zd, %d) %zd -> %zd\n", ms, pos, whence,
> 	    old, ms->offset);
> #endif
> 	return (ms->offset);
> }

> Thanks for reviewing this so far.

> -- 
> John Baldwin