kern/147226: read(fd, buffer, len) returns -1 immediately, if len >=2147483648

Thu Jun 3 05:10:50 UTC 2010

On Mon, 31 May 2010, Garrett Cooper wrote:

> The following reply was made to PR kern/147226; it has been noted by GNATS.

The following reply is only to to the addresses in the header mangled by
GNATS, so it might be lost by GNATS as usual:

> > From: Bruce Cran <bruce at cran.org.uk>
> > To: bug-followup at FreeBSD.org, eugene.kharitonov at gmail.com
> > Cc:
> > Subject: Re: kern/147226: read(fd, buffer, len) returns -1 immediately, =
> =A0if
> > =A0len &gt;=3D2147483648
> > Date: Mon, 31 May 2010 16:21:05 +0100
> >
> > =A0This actually looks like a 64-bit bug.
> > =A0http://opengroup.org/onlinepubs/007908775/xsh/read.html says that up t=
> o
> > =A0SSIZE_MAX bytes must be accepted, whereas FreeBSD only accepts up to
> > =A0INT_MAX bytes.
>
> The point being that SSIZE_MAX is INT_MAX on 32-bit archs and LONG_MAX
> on 64-bit archs.

Yes, the point is that SSIZE_MAX is only broken on 64-bit arches.  It
is supposed to give the limit for read() and write() (but not for much
else (1)), but the limit is actually INT_MAX, which differs from
SSIZE_MAX on broken arches.

The POSIX rationale makes it clear that SSIZE_MAX gives the actual limit
and that the actual limit may be significantly less than the maximum
of the type used to pass the value (ssize_t), but the POSIX spec conflicts
with this, at least in the old 2001 draft7:

% Spec:
% (1)
% 9110               {SSIZE_MAX}
% 9111                   Maximum value of an object of type ssize_t.
% 9112                   Minimum Acceptable Value: {_POSIX_SSIZE_MAX}
% (2)
% 13001 XSI           The type ssize_t shall be capable of storing values at least in the range [-1, {SSIZE_MAX}].  The
% 13002               type useconds_t shall be an unsigned integer type capable of storing values at least in the range

Here (2) is correct but redundant since (1) requires more, but (1) is
incorrect since it requires SSIZE_MAX to be the maximum of the range
while it is SSIZE_MAX that is the maximum and there may be no type whose
maximum (as a raw object (signed integer) type that has that maximum).
On the broken arches, it happens that such a type exists, but it is
not used due to ABI considerations.  The rationale explicitly allows
making SSIZE_MAX smaller so as to give the actual maximum without
requiring mangling of the ABI to limit it to the actual maximum or
mangling of the actual maximum to make it match the ABI.

% Rationale:
8548            ssize_t        This is intended to be a signed analog of size_t. The wording is such that an
8549                           implementation may either choose to use a longer type or simply to use the signed
8550                           version of the type that underlies size_t. All functions that return ssize_t (read( )
8551                           and write( )) describe as ``implementation-defined'' the result of an input exceeding
8552                           {SSIZE_MAX}. It is recognized that some implementations might have ints that
8553                           are smaller than size_t. A conforming application would be constrained not to
8554                           perform I/O in pieces larger than {SSIZE_MAX}, but a conforming application
8555                           using extensions would be able to use the full range if the implementation
8556                           provided an extended range, while still having a single type-compatible interface.

There is no corresponding rationale for SSIZE_MAX.

(1) Here is a complete list of APIs documented by the old draft as being
affected by the SSIZE_MAX limit.  Note that it is much smaller than the
list of APIs that use ssize_t.

     mq_receive(), msgrcv()
     read(), pread()
     readlink()
     write(), pwrite()
     strfmon() (a bogus (2) limit on the `size_t maxsize' arg).

(2) This limit is intended to limit the buffer size to a value that can
be returned by strfmon().  This is possible since strfmon() returns
ssize_t, but bogus since the same problem affects interfaces like
snprintf() to a much larger extent, and there is no problem and thus
should be no error unless a too-large value actually needs to be
returned, but strfmon() can never usefully want to return a too-large
value, unlike snprintf() which only almost never wants to return one
-- suppose someone has somehow obtained an enormous buffer (one of
size > SSIZE_MAX) and passes its size to strfmon() -- then who is
strfmon() to reject this buffer just because the format _might_ be
even more preposterous so as to generate a result larger than SSIZE_MAX?
(The behaviour is undefined if the passed size differs from the actual
size, even if bytes beyond the end of the buffer would not be accessed
by a naive implementation for this call, but strfmon() cannot easily
detect this error.)  For snprintf(), it is useful to be able to return
results much larger than the 20 characters or so needed printing the
maximum useful monetary value (~= the global GDP), but snprintf()'s
API is not pointlessly typedefed and in any case limits on the buffer
size have no effect on the returned size -- the returned size can
easily want to exceed (no-signed) SIZE_MAX even if the buffer size is 0,
by using enough %.*'s in the format to reach SIZE_MAX.

Later versions of POSIX: in the latest public version (2004) found by google:
- no change in the spec for {SSIZE_MAX}.  Since the rationale cannot change,
   the spec is still broken.
- no change in the spec for snprintf()'s return value.  It still specifies
   the impossible, by requiring snprintf() to return the number of bytes
   that would be written in all cases, but this is impossible if the number
   would exceed INT_MAX.  This bug is inherited from C99.

Bruce

kern/147226: read(fd, buffer, len) returns -1 immediately, if len &gt;=2147483648

kern/147226: read(fd, buffer, len) returns -1 immediately, if len >=2147483648