buf_ring in HEAD is racy

Oleg Bulyzhin oleg at FreeBSD.org
Thu Dec 26 17:54:19 UTC 2013


On Sat, Dec 14, 2013 at 12:04:57AM -0500, Ryan Stone wrote:
> I am seeing spurious output packet drops that appear to be due to
> insufficient memory barriers in buf_ring.  I believe that this is the
> scenario that I am seeing:
> 
> 1) The buf_ring is empty, br_prod_head = br_cons_head = 0
> 2) Thread 1 attempts to enqueue an mbuf on the buf_ring.  It fetches
> br_prod_head (0) into a local variable called prod_head
> 3) Thread 2 enqueues an mbuf on the buf_ring.  The sequence of events
> is essentially:
> 
> Thread 2 claims an index in the ring and atomically sets br_prod_head (say to 1)
> Thread 2 sets br_ring[1] = mbuf;
> Thread 2 does a full memory barrier
> Thread 2 updates br_prod_tail to 1
> 
> 4) Thread 2 dequeues the packet from the buf_ring using the
> single-consumer interface.  The sequence of events is essentialy:
> 
> Thread 2 checks whether queue is empty (br_cons_head == br_prod_tail),
> this is false
> Thread 2 sets br_cons_head to 1
> Thread 2 grabs the mbuf from br_ring[1]
> Thread 2 sets br_cons_tail to 1
> 
> 5) Thread 1, which is still attempting to enqueue an mbuf on the ring.
> fetches br_cons_tail (1) into a local variable called cons_tail.  It
> sees cons_tail == 1 but prod_head == 0 and concludes that the ring is
> full and drops the packet (incrementing br_drops unatomically, I might
> add)
> 
> 
> I can reproduce several drops per minute by configuring the ixgbe
> driver to use only 1 queue and then sending traffic from concurrent 8
> iperf processes.  (You will need this hacky patch to even see the
> drops with netstat, though:
> http://people.freebsd.org/~rstone/patches/ixgbe_br_drops.diff)
> 
> I am investigating fixing buf_ring by using acquire/release semantics
> rather than load/store barriers.  However, I note that this will
> apparently be the second attempt to fix buf_ring, and I'm seriously
> questioning whether this is worth the effort compared to the
> simplicity of using a mutex.  I'm not even convinced that a correct
> lockless implementation will even be a performance win, given the
> number of memory barriers that will apparently be necessary.
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"

I was able to reproduce this bug. I've used 2-headed intel 10g card with
twinax loopback and 4-core amd phenom. Here some results:
- netperf works better for me than iperf in bug exposing, i was able to
  see steady (~20packet per second) packet drop (running 4 netperf clients
  simultaniously (UDP_STREAM test)).

- i think you are right about the reason of those drops: there is a path in 
  buf_ring_enqueue() where packet can be dropped without proper synchronization
  with another instance of buf_ring_enqueue().

- i've looked through buf_ring.h and here is my opinion: race is possible
  between buf_ring_enqueue() and buf_ring_dequeue_mc() which can lead to 
  memory corruption. (though i didn't find any buf_ring_dequeue_mc() users in
  kernel, and this race is not possible on x86/amd64 archs due to their
  memory model, but it should be possible on arm/powerpc).

I've attached patch which should fix all these problems. If there is no
objection i can commit it. Unfortunatly i have no non-x86 hardware to play
with so i can't test enqueue()/dequeue_mc() race myself.

With this patch output drops are rare: in 10 tests (4 netperfs running
UDP_STREAM test for 60 seconds) 8 tests had no drops, one had 2 drops and
another one had 39 drops. These drops happens (i guess) when buf_ring is
_really_ full. Without this patch same test gives me about
20 drops _per second_.

-- 
Oleg.

================================================================
=== Oleg Bulyzhin -- OBUL-RIPN -- OBUL-RIPE -- oleg at rinet.ru ===
================================================================

-------------- next part --------------
A non-text attachment was scrubbed...
Name: buf_ring.h.diff
Type: text/x-diff
Size: 2809 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-net/attachments/20131226/9b8b84a6/attachment.diff>


More information about the freebsd-net mailing list