svn commit: r324405 - head/sys/kern

Tue Oct 10 00:09:51 UTC 2017

https://reviews.freebsd.org/D12633

On Mon, Oct 9, 2017 at 5:01 PM, Jason Eggleston <eggnet at gmail.com> wrote:

> In tcp_input.c, this is where a RST is handled. so_error gets ECONNRESET,
> then tcp_close() is called.
>
>                                 case TCPS_ESTABLISHED:
>                                 case TCPS_FIN_WAIT_1:
>                                 case TCPS_FIN_WAIT_2:
>                                 case TCPS_CLOSE_WAIT:
>                                 case TCPS_CLOSING:
>                                 case TCPS_LAST_ACK:
>                                         so->so_error = ECONNRESET;
>                                 close:
>                                         /* FALLTHROUGH */
>                                 default:
>                                         tp = tcp_close(tp);
>
> tcp_close() calls soisdisconnected() which sets this flag:
>
> so->so_state |= SS_ISDISCONNECTED;
>
> Followed by:
>
>                 ...
>                 socantrcvmore_locked(so);
>                 ...
>                 socantsendmore_locked(so);
>                 ...
>
> Those two functions set these flags:
>
> so->so_rcv.sb_state |= SBS_CANTRCVMORE;
> so->so_snd.sb_state |= SBS_CANTSENDMORE;
>
> A TCP session cannot survive a RST, and i/o calls will not work afterword,
> no matter how many times you try.
>
> Having said that, there's a race condition between when so_error is set
> and when the socket and socket buffer flags are set, and I agree with your
> strategy of:
>
> - have sendfile() match send()
> - The current send() behavior is probably fine as is
>
> I agree sendfile() should match write() and send(), by checking in this
> order:
>
> SBS_CANTSENDMORE
> so_error
> SS_ISCONNECTED
>
> and will prep a new review with your patch.
>
> On Mon, Oct 9, 2017 at 4:20 PM, Gleb Smirnoff <glebius at freebsd.org> wrote:
>
>>   Sean & Jason,
>>
>> On Sat, Oct 07, 2017 at 11:30:57PM +0000, Sean Bruno wrote:
>> S> Author: sbruno
>> S> Date: Sat Oct  7 23:30:57 2017
>> S> New Revision: 324405
>> S> URL: https://svnweb.freebsd.org/changeset/base/324405
>> S>
>> S> Log:
>> S>   Check so_error early in sendfile() call.  Prior to this patch, if a
>> S>   connection was reset by the remote end, sendfile() would just report
>> S>   ENOTCONN instead of ECONNRESET.
>> S>
>> S>   Submitted by:      Jason Eggleston <jason at eggnet.com>
>> S>   Reviewed by:       glebius
>> S>   Sponsored by:      Limelight Networks
>> S>   Differential Revision:     https://reviews.freebsd.org/D12575
>> S>
>> S> Modified:
>> S>   head/sys/kern/kern_sendfile.c
>> S>
>> S> Modified: head/sys/kern/kern_sendfile.c
>> S> ============================================================
>> ==================
>> S> --- head/sys/kern/kern_sendfile.c    Sat Oct  7 23:10:16 2017
>> (r324404)
>> S> +++ head/sys/kern/kern_sendfile.c    Sat Oct  7 23:30:57 2017
>> (r324405)
>> S> @@ -514,6 +514,11 @@ sendfile_getsock(struct thread *td, int s, struct
>> file
>> S>      *so = (*sock_fp)->f_data;
>> S>      if ((*so)->so_type != SOCK_STREAM)
>> S>              return (EINVAL);
>> S> +    if ((*so)->so_error) {
>> S> +            error = (*so)->so_error;
>> S> +            (*so)->so_error = 0;
>> S> +            return (error);
>> S> +    }
>> S>      if (((*so)->so_state & SS_ISCONNECTED) == 0)
>> S>              return (ENOTCONN);
>> S>      return (0);
>>
>> Despite my initial positive review, now I am quite unsure on that.
>>
>> Problem is that sendfile(2) isn't defined by SUS, so there is no
>> distinctive final answer on that. Should we match other OSes?
>> Should we match our historic behaviour? Or should we match
>> write(2)/send(2) to socket, which are closest analogy. I probably
>> believe in the latter: sendfile(2) belongs to write(2)/send(2)
>> family.
>>
>> SUS specifies that write may return ECONNRESET. It also documents
>> EPIPE. However, our manual page documents only EPIPE for both
>> send(2) and write(2). For write we have:
>>
>>         SOCKBUF_LOCK(&so->so_snd);
>>         if (so->so_snd.sb_state & SBS_CANTSENDMORE) {
>>                 SOCKBUF_UNLOCK(&so->so_snd);
>>                 error = EPIPE;
>>                 goto out;
>>         }
>>         if (so->so_error) {
>>                 error = so->so_error;
>>                 so->so_error = 0;
>>                 SOCKBUF_UNLOCK(&so->so_snd);
>>                 goto out;
>>         }
>>
>> Indeed, EPIPE will be returned prior to return/clear of so_error,
>> which supposedly is ECONNRESET.
>>
>> In the sendfile(2) implementation we see exactly same code:
>>
>>                 if (so->so_snd.sb_state & SBS_CANTSENDMORE) {
>>                         error = EPIPE;
>>                         SOCKBUF_UNLOCK(&so->so_snd);
>>                         goto done;
>>                 } else if (so->so_error) {
>>                         error = so->so_error;
>>                         so->so_error = 0;
>>                         SOCKBUF_UNLOCK(&so->so_snd);
>>                         goto done;
>>                 }
>>
>> But it isn't reached. Before due to SS_ISCONNECTED check, now
>> due to your change. Now we got two spots where so_error is
>> returned/cleared.
>>
>> For the receive family (read(2) and recv(2)), SUS specifies
>> ECONNRESET and our code does that. It also clears so_error.
>>
>> So, after your change at least one standard case is broken: an
>> application that polls/selects for both readability and
>> writeability of a socket. It discovers that socket is available
>> for writing, does sendfile(2), receives ECONNRESET. Then it does
>> read on the socket, and blocks(?) or returns a bogus error(?),
>> not sure. But definitely not ECONNRESET, since it is now cleared.
>>
>> Now, where does this ENOTCONN come from? These two lines:
>>
>>         if (((*so)->so_state & SS_ISCONNECTED) == 0)
>>                 return (ENOTCONN);
>>
>> can be traced back all the way to original 1998 commit of sendfile.
>>
>> Here comes difference between sendfile(2) and send(2). In send(2),
>> this check is done _after_ so_error check, so your change is correct
>> in placing of so_error check wrt so_state check, but is incorrect
>> in placing wrt socket buffer state check.
>>
>> After all this considerations, I would suggest to revert your change,
>> and apply this patch. It would make behavior of sendfile(2) to match
>> send(2). I would appreciate more reviews around it.
>>
>> --
>> Gleb Smirnoff
>>
>
>