short read/write and error code

Warner Losh imp at bsdimp.com
Wed Aug 1 14:12:10 UTC 2012


On Aug 1, 2012, at 1:59 AM, David Xu wrote:

> On 2012/8/1 15:19, Konstantin Belousov wrote:
>> On Wed, Aug 01, 2012 at 10:49:16AM +0800, David Xu wrote:
>>> POSIX requires write() to return actually bytes written, same rule is
>>> applied to read().
>>> 
>>> http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html
>>>> ETURN VALUE
>>>> 
>>>> Upon successful completion, write() [XSI]   and pwrite()  shall
>>>> return the number of bytes actually written to the file associated
>>>> with fildes. This number shall never be greater than nbyte.
>>>> Otherwise, -1 shall be returned and errno set to indicate the error.
>>> 
>>> http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
>>>> RETURN VALUE
>>>> 
>>>> Upon successful completion, read() [XSI]   and pread()  shall return
>>>> a non-negative integer indicating the number of bytes actually read.
>>>> Otherwise, the functions shall return -1 and set errno to indicate
>>>> the error.
>> Note that the wording is only about successful return, not for the case
>> when error occured. I do think that if fo_read() returned an error, and
>> error is not of the kind 'interruption', then the error shall be returned
>> as is.
> I do think data is more important than error code.  Do you think if a 512 bytes block is bad,
> all bytes in the block should be thrown away while you could really get some bytes from it,
> this might be very important to someone, such as a password or a bank account,  this
> is just an example, whether filesystem works in this way is irrelevant.

You do know that with disk drives it is an all or nothing sort of thing at the sector level.  Either you get the whole thing, or you get none of it.  There's no partial sector reads, and there's no way to get the data generally.  Some drives sometimes allow you to access raw tracks, but those interfaces are never connected to read, but usually an ioctl that issues the special command and returns the results.  And even then, it returns everything (perhaps including the ECC bytes)

> While program continues to execute,  next read()/write() should return -1 and errno will be
> set, I think both socket and pipe already work in this way, it is dofileread/dofilewrite have
> made it not happen.

Usually it is up to the driver to make this decision.  Most drivers already return 0 when they've put any data into the buffer.  The case where there's an error returned from the driver and also data indicated by resid would be vanishingly small.

>>> I have following patch to fix our code to be compatible with POSIX:
>> ...
>> 
>>> -current only resets error code to zero for short write when code is
>>> ERESTART, EINTR or EWOULDBLOCK.
>>> But this is incorrect, at least for pipe, when EPIPE is returned,
>>> some bytes may have already been written. For a named pipe, I may don't
>>> care a reader is disappeared or not, because for named pipe, a new
>>> reader can come in and talk with writer again,  so I need to know
>>> how many bytes have been written, same is applied to reader, I don't
>>> care writer is gone, it can come in again and talk with reader. So I
>>> suggest to remove surplus code in -current's dofilewrite() and
>>> dofileread().
>> Then fix the pipe code, and not introduce the behaviour change for all
>> file types ?
> see above, I think data is more important than error code,  and next read/write will
> get the error.
> 
>>> For EPIPE, We still deliver SIGPIPE to current thread, but returns
>>> actually bytes written.
>> And this sounds wrong. I think that fixing the code for pipes would also
>> semi-magically makes this correct.

Yes.  Pipes are too magical and don't match devices very well.

Warner


More information about the freebsd-arch mailing list