write system call violates POSIX standard

Wed Feb 15 20:02:52 UTC 2012

On Wed, 15 Feb 2012, Konstantin Belousov wrote:

> On Wed, Feb 15, 2012 at 03:36:11PM +0100, Nicolas Bourdaud wrote:
>> When a write() cannot transfer as many bytes as requested (because of a
>> file limit), it fails instead of transferring as many bytes as there is
>> room to write.
>>
>> This is a violation of the POSIX standard:
>> http://pubs.opengroup.org/onlinepubs/007904975/functions/write.html
>> ...
> It seems that you are right.

Here is a corresponding test to show the complete brokenness of
RLIMIT_FSIZE for [f]truncate():

%%%
#include <sys/resource.h>
#include <sys/stat.h>

#include <err.h>
#include <fcntl.h>
#include <signal.h>
#include <stdint.h>
#include <stdio.h>

#define LIMSIZE		60000

int
main(void)
{
 	struct rlimit lim;
 	struct stat sb;
 	int fd;

 	if (signal(SIGXFSZ, SIG_IGN) == SIG_ERR)
 		err(1, "signal");
 	if (getrlimit(RLIMIT_FSIZE, &lim) != 0)
 		err(1, "getrlimit");
 	lim.rlim_cur = LIMSIZE;
 	if (setrlimit(RLIMIT_FSIZE, &lim) != 0)
 		err(1, "setrlimit");

 	fd = open("result.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
 	if (fd < 0)
 		err(1, "open");
 	if (fstat(fd, &sb) != 0)
 		err(1, "first stat");
 	if (sb.st_size != 0)
 		errx(1, "O_TRUNC failed to truncate the file");
 	if (ftruncate(fd, 2 * LIMSIZE) != 0)
 		err(1, "ftruncate");
 	if (fstat(fd, &sb) != 0)
 		err(1, "stat");
 	warnx("size = %jd", (intmax_t)sb.st_size);
 	if (sb.st_size == 2 * LIMSIZE)
 		errx(1, "ftruncate failed to honour RLIMIT_FSIZE, as expected");
 	if (sb.st_size != 0)
 		errx(1, "ftruncate worked incorrectly, but not as expected");
 	errx(0, "ftruncate worked correctly, but not as expected");
}
%%%

> A solution could be to return an error if uio->uio_offset itself is
> larger them RLIMIT_FSIZE. If it is less then the limit, the function
         ^ or equal to, and the count is not 0 (?)
> could trim the supplied uio at the RLIMIT_FSIZE value instead.
>
> Do you want to work on the patch ?

Only indirectly for me :-).

Note that both of these are XSI extensions, and FreeBSD doesn't claim
to support XSI, so it doesn't have to duplicate any XSI bugs in these
APIs.  But it is clearly a bug to not honor the rlimit at all.  Anyone
can try to create multiple-petabyte files in FreeBSD, and often succeed,
and such files may take a lot of space for metadata although all blocks
beyond the rlimit must be zero.

Note that the error handling is different but simpler for [f]truncate().
The current error checking in vfs is correct for these.  Except, see
below about null changes.

There is another thread (PR or POSIX mail?) about truncate() having
different, broken, semantics than iftruncate() when its effect is null.
POSIX specifies that ftruncate() shall mark times for update on
successful completion, as usual, but POSIX specifies that "if the file
size is changed, this function shall mark for update...".  This is a
bug in POSIX, but Linux apparently implemented it.  FreeBSD doesn't
implement it, at least in ffs.  In ffs_update(), the implementation
is to check for this case early and do nothing except mark for update
before returning.

POSIX has fuzzy wording for the interaction of these bugs.  Suppose
that the file size is already larger than the rlimit, and we try to
truncate it to its current size.  Is this a null change or an EFBIG
error?  POSIX only says (for [f]truncate) that "if the request _would_
_cause_ the file size to exceed the soft file limit, [then it is an
error]".  I think a null change "wouldn't cause" the file to exceed
the limit in this case, because the cause of exceeding the limit is
that the limit was already exceeded.  However, it takes a delicate
reading of "would case" to get this interpretation, and FreeBSD never
did it this way in cases where it actually checks the limit -- for
write(), the limit is checked before even looking at the current
file size.  The centralization of the limit checking makes it harder
to change this, because the central function doesn't know the file
size.

Truncations that would reduce the file size from beyond the limit to less
beyond the limit are also interesting.  Are these allowed?  Now they
cause something, but they don't cause the file size to exceed the limit,
so a strict reading of "would cause" again allows them.

write() has some very nice, different bugs depending on the
interpretation of to the corresponding "would cause" for it.  In
FreeBSD, because the limit checking is done before even looking at the
size of the file, write()s to the middle of a big file are rejected
if they would extend past the limit.  But the POSIX specification is
that "if the request _would_ _cause_ the file size to exceed the soft
file limit, [then as for truncate, except it is not an error if the
write starts before the limit, and bytes shall be written if possible
up to the limit in this case]".  This wording is not very different
that that for ftruncate, but now it seems even harder to blame the
write for causing the limit to be exceeded if the write would be in
the middle of the file.  It seems useful to allow writing in the middle
of a big file irrespective of the limit, to allow not-fully-trusted
applications to scribble in a big file that you have reserved for them.
But the above bug in ftruncate becomes enormous if you allow writing
in the middle of a big file that the bug has allowed creation of.

Just above the paragraph about this, handling related to ENOSPC is
specified as "if a write() requests that more bytes be written than
there is room for (for example, ... the physical end of the medium),
only as many bytes as there is room for shall be written".  This is a
little fuzzy, but it seems to be intended to mean that these bytes shall
be written with no error if possible.  This disallows the ffs treatment
of backing out of the entire write after an ENOSPC error in the same
way as after an EIO error.

Bruce