buf_ring in HEAD is racy

Adrian Chadd adrian at freebsd.org
Tue Dec 17 15:43:50 UTC 2013


ie:

Index: sys/dev/ixgbe/ixgbe.c
===================================================================
--- sys/dev/ixgbe/ixgbe.c       (revision 2995)
+++ sys/dev/ixgbe/ixgbe.c       (working copy)
@@ -5178,6 +5178,7 @@
        struct ixgbe_hw *hw = &adapter->hw;
        u32  missed_rx = 0, bprc, lxon, lxoff, total;
        u64  total_missed_rx = 0;
+       u64  odrops;

        adapter->stats.crcerrs += IXGBE_READ_REG(hw, IXGBE_CRCERRS);
        adapter->stats.illerrc += IXGBE_READ_REG(hw, IXGBE_ILLERRC);
@@ -5308,6 +5309,11 @@
                adapter->stats.fcoedwtc += IXGBE_READ_REG(hw, IXGBE_FCOEDWTC);
        }

+       /* TX drops */
+       for (int i = 0; i < adapter->num_queues; i++) {
+               odrops += adapter->tx_rings[i].br->br_drops;
+       }
+
        /* Fill out the OS statistics structure */
        ifp->if_ipackets = adapter->stats.gprc;
        ifp->if_opackets = adapter->stats.gptc;
@@ -5317,6 +5323,9 @@
        ifp->if_omcasts = adapter->stats.mptc;
        ifp->if_collisions = 0;

+       /* TX drops are stored in if_snd for now, not the top level counters */
+       ifp->if_snd.ifq_drops = drops;
+
        /* Rx Errors */
        ifp->if_iqdrops = total_missed_rx;
        ifp->if_ierrors = adapter->stats.crcerrs + adapter->stats.rlec;




-adrian

On 16 December 2013 12:19, Adrian Chadd <adrian at freebsd.org> wrote:
> So I've done some digging.
>
> The TL;DR - we really should expose the drops count in the ifa_data
> field. But that would involve an ABI change and .. well, that will be
> full of drama.
>
> The non-TL:DR:
>
> * The send queue drops are in the ifmibdata structure, which includes
> a few other things. But that requires you use the MIB interface to
> pull things out, rather than getifaddrs().
> * getifaddrs() doesn't contain the max sendq length or drops, so we
> can't report those without adding a MIB fetch for the relevant
> interface.
> * bsnmp (which we use at work) correctly populates output discards by
> checking the MIB for ifmd_snd_drops.
>
> So I'd rather we fix ixgbe and other drivers by updating the send
> queue drops counter if the frames are dropped. That way it's
> consistent with other things.
>
> We should then do some of these things:
>
> * add a send queue drops field to the ifdata/ifadata where appropriate
> and make sure it gets correctly populated;
> * teach netstat to use ifmib instead of getifaddrs();
> * as part of killing the ifsnd queue stuff, we should also migrate
> that particular drops counter out from there and to a top-level
> counter in ifnet, like the rest.
>
> Comments?
>
>
> -a
>
>
> On 14 December 2013 16:06, Adrian Chadd <adrian at freebsd.org> wrote:
>> oh cool, you just did the output-drops thing I was about to code up.
>> We're missing those counters at work and the ops guys poked me about
>> it.
>>
>> I'll also give that a whirl locally and see about working with jack to
>> get it into -HEAD / MFC'ed to 10.
>>
>> Thanks!
>>
>>
>> -a
>>
>>
>> On 13 December 2013 21:04, Ryan Stone <rysto32 at gmail.com> wrote:
>>> I am seeing spurious output packet drops that appear to be due to
>>> insufficient memory barriers in buf_ring.  I believe that this is the
>>> scenario that I am seeing:
>>>
>>> 1) The buf_ring is empty, br_prod_head = br_cons_head = 0
>>> 2) Thread 1 attempts to enqueue an mbuf on the buf_ring.  It fetches
>>> br_prod_head (0) into a local variable called prod_head
>>> 3) Thread 2 enqueues an mbuf on the buf_ring.  The sequence of events
>>> is essentially:
>>>
>>> Thread 2 claims an index in the ring and atomically sets br_prod_head (say to 1)
>>> Thread 2 sets br_ring[1] = mbuf;
>>> Thread 2 does a full memory barrier
>>> Thread 2 updates br_prod_tail to 1
>>>
>>> 4) Thread 2 dequeues the packet from the buf_ring using the
>>> single-consumer interface.  The sequence of events is essentialy:
>>>
>>> Thread 2 checks whether queue is empty (br_cons_head == br_prod_tail),
>>> this is false
>>> Thread 2 sets br_cons_head to 1
>>> Thread 2 grabs the mbuf from br_ring[1]
>>> Thread 2 sets br_cons_tail to 1
>>>
>>> 5) Thread 1, which is still attempting to enqueue an mbuf on the ring.
>>> fetches br_cons_tail (1) into a local variable called cons_tail.  It
>>> sees cons_tail == 1 but prod_head == 0 and concludes that the ring is
>>> full and drops the packet (incrementing br_drops unatomically, I might
>>> add)
>>>
>>>
>>> I can reproduce several drops per minute by configuring the ixgbe
>>> driver to use only 1 queue and then sending traffic from concurrent 8
>>> iperf processes.  (You will need this hacky patch to even see the
>>> drops with netstat, though:
>>> http://people.freebsd.org/~rstone/patches/ixgbe_br_drops.diff)
>>>
>>> I am investigating fixing buf_ring by using acquire/release semantics
>>> rather than load/store barriers.  However, I note that this will
>>> apparently be the second attempt to fix buf_ring, and I'm seriously
>>> questioning whether this is worth the effort compared to the
>>> simplicity of using a mutex.  I'm not even convinced that a correct
>>> lockless implementation will even be a performance win, given the
>>> number of memory barriers that will apparently be necessary.
>>> _______________________________________________
>>> freebsd-net at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"


More information about the freebsd-net mailing list