kern/160992: buf_ring(9) statistics accounting not MPSAFE

Bruce Evans brde at optusnet.com.au
Sun Sep 25 03:10:14 UTC 2011


The following reply was made to PR kern/160992; it has been noted by GNATS.

From: Bruce Evans <brde at optusnet.com.au>
To: Arnaud Lacombe <lacombar at gmail.com>
Cc: freebsd-gnats-submit at freebsd.org, freebsd-bugs at freebsd.org
Subject: Re: kern/160992: buf_ring(9) statistics accounting not MPSAFE
Date: Sun, 25 Sep 2011 13:01:04 +1000 (EST)

 On Sat, 24 Sep 2011, Arnaud Lacombe wrote:
 
 >> Description:
 > The following block of code, in `sys/sys/buf_ring.h':
 >
 >
 >       /*
 >        * If there are other enqueues in progress
 >        * that preceeded us, we need to wait for them
 >        * to complete
 >        */
 >       while (br->br_prod_tail != prod_head)
 >               cpu_spinwait();
 >       br->br_prod_bufs++;
 >       br->br_prod_bytes += nbytes;
 >       br->br_prod_tail = prod_next;
 >       critical_exit();
 >
 >
 > can be seen at runtime, memory-wise as:
 >
 >      while (br->br_prod_tail != prod_head)
 >              cpu_spinwait();
 >      br->br_prod_tail = prod_next;
 >      br->br_prod_bufs++;
 >      br->br_prod_bytes += nbytes;
 >      critical_exit();
 >
 > That is, there is no memory barrier to enforce completion of the
 > load/increment/store/load/load/addition/store operations before
 > updating what other thread spin on.
 
 The counters are 64 bits, so it also does non-atomic increments of them
 no 32-bit arches.
 
 > Even if `br_prod_tail' is marked `volatile', there is no guarantee that it will not be re-ordered wrt. non-volatile write (to `br_prod_bufs' and `br_prod_bytes').
 
 Using volatile is generally bogus.  Here it seems to mainly give
 pessimizations and more opportunities for bad memory orders.  The i386
 code for incrementing a 64-bit volatile x is:
 
  	movl	x, %eax
  	movl	x+4, %edx
  	addl	$1, %eax
  	adcl	$0, %edx
  	movl	%eax, x
  	movl	%edx, x+4
 
 while for a 64-bit non-volatile it is:
 
  	addl	$1, x
  	adcl	$0, x+4
 
 so volatile gives more caching in registers instead of less.  The following
 are some of the bad memory orders possible:
 
 with volatile:
         lo = br->br_prod_bytes.lo;
         hi = br->br_prod_bytes.hi;
         br->br_prod_tail = prod_next;
         br->br_prod_bufs++;
         lo += nbytes;
         hi += carry;
         br->br_prod_bytes.hi = hi;
         br->br_prod_bytes.lo = lo;
 
 without volatile:
 
         br->br_prod_bytes.lo += nbytes;
         br->br_prod_tail = prod_next;
         br->br_prod_bufs++;
         br->br_prod_bytes.hi += carry;
 
 I think the token method would make the nonatomic accesses to the
 counters sufficiently atomic if it worked.  The necessary memory
 barriers would probably have a memory clobber which effectively makes
 all memory variables transiently volatile (where volatile actually
 means non-volatile with respect to them changing -- holding the token
 prevents them changing -- but actually means volatile with respect to
 their memory accesses) so the effect of declaring the counters permanently
 volatile would be reduced to a pessimization.  Even reads of them in
 sysctls must hold the token to get a consistent snapshot.
 
 Bruce


More information about the freebsd-bugs mailing list