short read/write and error code

Konstantin Belousov kostikbel at gmail.com
Wed Aug 1 16:28:44 UTC 2012


On Wed, Aug 01, 2012 at 07:23:09PM +1000, Bruce Evans wrote:
> On Wed, 1 Aug 2012, Konstantin Belousov wrote:
> 
> >On Wed, Aug 01, 2012 at 10:49:16AM +0800, David Xu wrote:
> >>POSIX requires write() to return actually bytes written, same rule is
> >>applied to read().
> >>
> >>http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html
> >>>ETURN VALUE
> >>>
> >>>Upon successful completion, write() [XSI]   and pwrite()  shall
> >>>return the number of bytes actually written to the file associated
> >>>with fildes. This number shall never be greater than nbyte.
> >>>Otherwise, -1 shall be returned and errno set to indicate the error.
> >>
> >>http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
> >>>RETURN VALUE
> >>>
> >>>Upon successful completion, read() [XSI]   and pread()  shall return
> >>>a non-negative integer indicating the number of bytes actually read.
> >>>Otherwise, the functions shall return -1 and set errno to indicate
> >>>the error.
> >Note that the wording is only about successful return, not for the case
> >when error occured. I do think that if fo_read() returned an error, and
> >error is not of the kind 'interruption', then the error shall be returned
> >as is.
> 
> That is clearly not what is intended.  write() is unusable if it won't
> tell you how many bytes it wrote.  According to your interpretation,
> recalcitrantix would conform to POSIX if all it writes wrote whatever
> they could and then returned -1 after detecting the error EPOSIXFUZZY.
I think this is obvious pull, because no useful implementation would
insert _artificial_ error.

> 
> The usability is specified for signals.  From an old POSIX draft:
> 
> % 51235              If write( ) is interrupted by a signal before it 
> writes any data, it shall return -1 with errno set to
> % 51236              [EINTR].
> % 51237              If write( ) is interrupted by a signal after it 
> successfully writes some data, it shall return the
> % 51238              number of bytes written.
This is exactly what existing code does.

> 
> POSIX formally defines "Successfully Transferred", mainly for aio.  I
> couldn't find any formal definition of "successfully writes", but clearly
> it is nonsense for a write to be unsuccessful if a reader on the local
> system or on an external system has successfully read some of the data
> written by the write.
> 
> FreeBSD does try to convert EINTR to 0 after some data has been written,
> to conform to the above.  SIGPIPE should return EINTR to be returned to
> dofilewrite(), so there should be no problem for SIGPIPE.  But we were
> reminded of this old FreeBSD bug by probelms with SIGPIPE.
Sorry, I do not understand this, esp. second sentence.

As I said, patch behaviour in regard of SIGPIPE is just wrong.
> 
> POSIX contradicts itself by disallowing successful completion if _any_
> error is detected:
> 
> % 435              RETURN VALUE
> % 436                        This section indicates the possible return 
> values, if any.
> % 437                        If the implementation can detect errors, 
> ``successful completion'' means that no error
> % 438                        has been detected during execution of the 
> function. If the implementation does detect
> 
> Relcalcitrantix has 2 versions according to which of these contradictions
> has precedence.  In one version, writes do as much as possible before
> returning -1/EPOSIXFUZZY, as above.  In the other version, this still
> happens for most writes.  But ones that are interrupted by a signal after
> having written some data return the number of bytes written, accoding to
> the "shall" for the interrupted case.  Perhaps there are some other weird
> cases where writes are required to work :-).
> 
> >>I have following patch to fix our code to be compatible with POSIX:
> >...
> >
> >>-current only resets error code to zero for short write when code is
> >>ERESTART, EINTR or EWOULDBLOCK.
> >>But this is incorrect, at least for pipe, when EPIPE is returned,
> >>some bytes may have already been written. For a named pipe, I may don't
> >>care a reader is disappeared or not, because for named pipe, a new
> >>reader can come in and talk with writer again,  so I need to know
> >>how many bytes have been written, same is applied to reader, I don't
> >>care writer is gone, it can come in again and talk with reader. So I
> >>suggest to remove surplus code in -current's dofilewrite() and
> >>dofileread().
> >Then fix the pipe code, and not introduce the behaviour change for all
> >file types ?
> 
> Because returning the error to userland breaks all file types that
> want to return a short i/o (mainly special files whose i/o cannot be
> backed out of).  They are just detecting and returning an error as a
> courtesy to upper layers, and to simplify the implementation.  The
> syscall API doesn't permit returning both the error code (the reason
> for the short i/o) and the short count, so the error code must be
> cleared to allow the short count to be returned.
No, there is the only sane behaviour for the fo_read and fo_write, to
return either no error (or interruption error) and adjust resid, or
return error. Returning both error and adjusting resid is just wrong.

Proposed patch makes generic i/o layer much less flexible, and probably
preventing implementation of things like transactional writes.

We should fix sys_pipe.c and not require filesystems to roll back uio
into inconsistent state to report errors (since rolling back into
consistent state is typically impossible but is required after the patch).

> 
> Bruce
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20120801/06e5b5de/attachment.pgp


More information about the freebsd-arch mailing list