[PATCH] open_memstream() and open_wmemstream()
John Baldwin
jhb at freebsd.org
Fri Feb 15 15:52:23 UTC 2013
On Friday, February 15, 2013 10:10:12 am Jilles Tjoelker wrote:
> On Wed, Feb 13, 2013 at 11:44:19AM -0500, John Baldwin wrote:
> > On Thursday, February 07, 2013 4:12:22 pm Jilles Tjoelker wrote:
> > > On Tue, Feb 05, 2013 at 03:46:43PM -0500, John Baldwin wrote:
> > > > I've written an implementation of open_memstream() and
> > > > open_wmemstream() along with a set of regression tests. I'm pretty
> > > > sure open_memstream() is correct, and I believe open_wmemstream() is
> > > > correct for expected usage. The latter might even do the right thing
> > > > if you split a multi-byte character across multiple writes. One
> > > > question I have is if my choice to discard any pending multi-byte
> > > > state in the stream anytime a seek changes the effective position in
> > > > the output stream. I think this is correct as stdio will flush any
> > > > pending data before doing a seek, so if there is a partially parsed
> > > > character we aren't going to get the rest of it.
>
> > > I don't think partially parsed characters can happen with a correct
> > > application. As per C99, an application must not call byte output
> > > functions on a wide-oriented stream, and vice versa.
>
> > Oh, interesting. Should I remove those tests for using byte output
> > functions from the open_wmemstream test?
>
> Yes. Unless real applications are identified that use this undefined
> behaviour, such a test would test the implementation and not the
> specification.
Ok.
> I briefly tried to run the tests against the glibc implementation
> (Ubuntu 10.04) and got many failures. Glibc does not check the
> parameters for null pointers and simply stores them blindly which is
> POSIX-compliant. Things like bad length 6 for "foo" appear to be glibc
> bugs, and test-open_wmemstream even segfaults.
Eh, the open group page requires EINVAL:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/open_memstream.html
> > > Discarding the shift state on fseek()/fseeko() is permitted (but should
> > > be documented as this is implementation-defined behaviour).
> > > State-dependent encodings (where this is relevant) are rarely used
> > > nowadays.
>
> > > The conversion to bytes and back probably makes open_wmemstream() quite
> > > slow but I don't think that is very important.
>
> > Note that we do this already for swprintf(). I've added this note:
>
> > IMPLEMENTATION NOTES
> > Internally all I/O streams are effectively byte-oriented, so using wide-
> > oriented operations to write to a stream opened via open_wmemstream()
> > results in wide characters being expanded to a stream of multibyte char-
> > acters in stdio's internal buffers. These multibyte characters are then
> > converted back to wide characters when written into the stream. As a
> > result, the wide-oriented streams maintain an internal multibyte charac-
> > ter conversion state that is cleared on any seek opertion that changes
> > the current position. This should have no effect unless byte-oriented
> > output operations are used on a wide-oriented stream.
>
> This appears to "approve" the use of byte-oriented operations on a
> wide-oriented stream which is in fact undefined.
Hmm, I can remove the last bit ("unless..."), but am struggling to think of good
wording. Maybe:
This should only affect use of byte-oriented output operations on a wide-
oriented stream which is an undefined operation and should be avoided.
> > > > http://www.FreeBSD.org/~jhb/patches/open_memstream.patch
>
> > > The seek functions should check for overflow in the addition (for
> > > SEEK_CUR and SEEK_END) and the conversion to size_t.
>
> > Note that SEEK_CUR is effectively only called with an offset of 0
> > via __ftello(). I'm not sure of the best way to check for overflow, do I have
> > to do something like this:
>
> > static int
> > will_overflow(size_t off, fpos_t pos)
> > {
> > if (pos < 0)
> > return (-pos > off);
> > return (SIZE_MAX - off > pos);
> > }
>
> > For SEEK_CUR I would test 'will_overflow(ms->offset, pos)' and
> > for SEEK_END 'will_overflow(ms->len, pos)'
>
> > Presumably SEEK_SET should test for the requested position being
> > beyond SIZE_MAX as well?
>
> > If my above test is ok then I end up with this:
>
> Hmm, perhaps it is better to restrict to SSIZE_MAX instead of SIZE_MAX
> as this greatly simplifies the code. With the above code, comparisons
> may be signed or unsigned and there may be no greater type that fits
> both a size_t and an fpos_t. Negating a negative fpos_t is also
> incorrect because it may overflow. If you restrict the memory buffer to
> ssize_t, an ssize_t can be safely converted to an fpos_t.
Humm. Presumably I have to check for overflow in the expression to grow
the stream as well? I have no idea how to write these checks. :(
Of course, practically speaking (and I can make assumptions about the
implementation I think since this is libc internals after all),
fpos_t == off_t and sizeof(size_t) <= sizeof(off_t) for all platforms
we support (and likely will always be true, files can always be larger
than memory). Does that simplify things if I assume that?
--
John Baldwin
More information about the freebsd-current
mailing list