buf_ring in HEAD is racy

Michael Tuexen Michael.Tuexen at lurchi.franken.de
Tue Dec 17 16:39:50 UTC 2013


On Dec 17, 2013, at 4:43 PM, Adrian Chadd <adrian at freebsd.org> wrote:

> ie:
> 
> Index: sys/dev/ixgbe/ixgbe.c
> ===================================================================
> --- sys/dev/ixgbe/ixgbe.c       (revision 2995)
> +++ sys/dev/ixgbe/ixgbe.c       (working copy)
> @@ -5178,6 +5178,7 @@
>        struct ixgbe_hw *hw = &adapter->hw;
>        u32  missed_rx = 0, bprc, lxon, lxoff, total;
>        u64  total_missed_rx = 0;
> +       u64  odrops;
What about 
+       u64 odrops = 0;
since odrops seems to be used uninitialized.

Best regards
Michael
> 
>        adapter->stats.crcerrs += IXGBE_READ_REG(hw, IXGBE_CRCERRS);
>        adapter->stats.illerrc += IXGBE_READ_REG(hw, IXGBE_ILLERRC);
> @@ -5308,6 +5309,11 @@
>                adapter->stats.fcoedwtc += IXGBE_READ_REG(hw, IXGBE_FCOEDWTC);
>        }
> 
> +       /* TX drops */
> +       for (int i = 0; i < adapter->num_queues; i++) {
> +               odrops += adapter->tx_rings[i].br->br_drops;
> +       }
> +
>        /* Fill out the OS statistics structure */
>        ifp->if_ipackets = adapter->stats.gprc;
>        ifp->if_opackets = adapter->stats.gptc;
> @@ -5317,6 +5323,9 @@
>        ifp->if_omcasts = adapter->stats.mptc;
>        ifp->if_collisions = 0;
> 
> +       /* TX drops are stored in if_snd for now, not the top level counters */
> +       ifp->if_snd.ifq_drops = drops;
> +
>        /* Rx Errors */
>        ifp->if_iqdrops = total_missed_rx;
>        ifp->if_ierrors = adapter->stats.crcerrs + adapter->stats.rlec;
> 
> 
> 
> 
> -adrian
> 
> On 16 December 2013 12:19, Adrian Chadd <adrian at freebsd.org> wrote:
>> So I've done some digging.
>> 
>> The TL;DR - we really should expose the drops count in the ifa_data
>> field. But that would involve an ABI change and .. well, that will be
>> full of drama.
>> 
>> The non-TL:DR:
>> 
>> * The send queue drops are in the ifmibdata structure, which includes
>> a few other things. But that requires you use the MIB interface to
>> pull things out, rather than getifaddrs().
>> * getifaddrs() doesn't contain the max sendq length or drops, so we
>> can't report those without adding a MIB fetch for the relevant
>> interface.
>> * bsnmp (which we use at work) correctly populates output discards by
>> checking the MIB for ifmd_snd_drops.
>> 
>> So I'd rather we fix ixgbe and other drivers by updating the send
>> queue drops counter if the frames are dropped. That way it's
>> consistent with other things.
>> 
>> We should then do some of these things:
>> 
>> * add a send queue drops field to the ifdata/ifadata where appropriate
>> and make sure it gets correctly populated;
>> * teach netstat to use ifmib instead of getifaddrs();
>> * as part of killing the ifsnd queue stuff, we should also migrate
>> that particular drops counter out from there and to a top-level
>> counter in ifnet, like the rest.
>> 
>> Comments?
>> 
>> 
>> -a
>> 
>> 
>> On 14 December 2013 16:06, Adrian Chadd <adrian at freebsd.org> wrote:
>>> oh cool, you just did the output-drops thing I was about to code up.
>>> We're missing those counters at work and the ops guys poked me about
>>> it.
>>> 
>>> I'll also give that a whirl locally and see about working with jack to
>>> get it into -HEAD / MFC'ed to 10.
>>> 
>>> Thanks!
>>> 
>>> 
>>> -a
>>> 
>>> 
>>> On 13 December 2013 21:04, Ryan Stone <rysto32 at gmail.com> wrote:
>>>> I am seeing spurious output packet drops that appear to be due to
>>>> insufficient memory barriers in buf_ring.  I believe that this is the
>>>> scenario that I am seeing:
>>>> 
>>>> 1) The buf_ring is empty, br_prod_head = br_cons_head = 0
>>>> 2) Thread 1 attempts to enqueue an mbuf on the buf_ring.  It fetches
>>>> br_prod_head (0) into a local variable called prod_head
>>>> 3) Thread 2 enqueues an mbuf on the buf_ring.  The sequence of events
>>>> is essentially:
>>>> 
>>>> Thread 2 claims an index in the ring and atomically sets br_prod_head (say to 1)
>>>> Thread 2 sets br_ring[1] = mbuf;
>>>> Thread 2 does a full memory barrier
>>>> Thread 2 updates br_prod_tail to 1
>>>> 
>>>> 4) Thread 2 dequeues the packet from the buf_ring using the
>>>> single-consumer interface.  The sequence of events is essentialy:
>>>> 
>>>> Thread 2 checks whether queue is empty (br_cons_head == br_prod_tail),
>>>> this is false
>>>> Thread 2 sets br_cons_head to 1
>>>> Thread 2 grabs the mbuf from br_ring[1]
>>>> Thread 2 sets br_cons_tail to 1
>>>> 
>>>> 5) Thread 1, which is still attempting to enqueue an mbuf on the ring.
>>>> fetches br_cons_tail (1) into a local variable called cons_tail.  It
>>>> sees cons_tail == 1 but prod_head == 0 and concludes that the ring is
>>>> full and drops the packet (incrementing br_drops unatomically, I might
>>>> add)
>>>> 
>>>> 
>>>> I can reproduce several drops per minute by configuring the ixgbe
>>>> driver to use only 1 queue and then sending traffic from concurrent 8
>>>> iperf processes.  (You will need this hacky patch to even see the
>>>> drops with netstat, though:
>>>> http://people.freebsd.org/~rstone/patches/ixgbe_br_drops.diff)
>>>> 
>>>> I am investigating fixing buf_ring by using acquire/release semantics
>>>> rather than load/store barriers.  However, I note that this will
>>>> apparently be the second attempt to fix buf_ring, and I'm seriously
>>>> questioning whether this is worth the effort compared to the
>>>> simplicity of using a mutex.  I'm not even convinced that a correct
>>>> lockless implementation will even be a performance win, given the
>>>> number of memory barriers that will apparently be necessary.
>>>> _______________________________________________
>>>> freebsd-net at freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
> 



More information about the freebsd-net mailing list