svn commit: r223198 - head/sys/dev/e1000

John Baldwin jhb at freebsd.org
Fri Jun 17 20:25:40 UTC 2011


On Friday, June 17, 2011 4:06:52 pm John Baldwin wrote:
> Author: jhb
> Date: Fri Jun 17 20:06:52 2011
> New Revision: 223198
> URL: http://svn.freebsd.org/changeset/base/223198
> 
> Log:
>   - Use a dedicated task to handle deferred transmits from the if_transmit
>     method instead of reusing the existing per-queue interrupt task.
>     Reusing the per-queue interrupt task could result in both an interrupt
>     thread and the taskqueue thread trying to handle received packets on a
>     single queue resulting in out-of-order packet processing.
>   - Don't define igb_start() at all on 8.0 and where if_transmit is used.
>     Replace last remaining call to igb_start() with a loop to kick off
>     transmit on each queue instead.
>   - Call ether_ifdetach() earlier in igb_detach().
>   - Drain tasks and free taskqueues during igb_detach().
>   
>   Reviewed by:	jfv
>   MFC after:	1 week
> 
> Modified:
>   head/sys/dev/e1000/if_igb.c
>   head/sys/dev/e1000/if_igb.h

FYI, I ran into a workload where the concurrent reception of packets was
breaking TCP.  Specifically, the two threads could both attempt to process
ACKs for a connection in the syncache.  The first thread would "win" and
create a connection, but the second thread had already done a pcb lookup and
found the listen socket before waiting for a write lock on the TCP pcbinfo.
As a result, the second thread also attempted to create a new connection
based on the syncookie.  However, it failed in in_pcbconnect_setup() with
EADDRINUSE when it found the first connection in the PCB hash.  When it
failed, it dropped the ACK and sent a RST to the remote end causing the
other end to drop the connection silently.  Unfortunately, the first thread
had created a valid socket which was returned to userland via accept().
That socket contained all the inflight data sent by the remote end before
it received the RST.  The net effect was that a user app would see a
connection that only sent part of its data and then returned EOF.

Note that a truly bidirectional application-level protocol would still break
in this case with an EPIPE/SIGPIPE.  However, if the remote peer is just
opening a socket, dumping some data into it and then closing it without
reading any data, it may close the socket before the RST arrives and thus
encounter no errors completely unaware that the data it just sent over TCP
was partially (or completely) lost.

Note that that can still happen when using the syncache since we may fail to
create a socket when expanding a syncache entry due to resource exhaustion
giving similarly unpleasant failure semantics (i.e. the remote user app
doesn't get an error and has no clue that their data is in fact lost).

-- 
John Baldwin


More information about the svn-src-head mailing list