kern/160992: buf_ring(9) statistics accounting not MPSAFE

Bruce Evans brde at optusnet.com.au
Sun Sep 25 03:01:12 UTC 2011


On Sat, 24 Sep 2011, Arnaud Lacombe wrote:

>> Description:
> The following block of code, in `sys/sys/buf_ring.h':
>
>
>       /*
>        * If there are other enqueues in progress
>        * that preceeded us, we need to wait for them
>        * to complete
>        */
>       while (br->br_prod_tail != prod_head)
>               cpu_spinwait();
>       br->br_prod_bufs++;
>       br->br_prod_bytes += nbytes;
>       br->br_prod_tail = prod_next;
>       critical_exit();
>
>
> can be seen at runtime, memory-wise as:
>
>      while (br->br_prod_tail != prod_head)
>              cpu_spinwait();
>      br->br_prod_tail = prod_next;
>      br->br_prod_bufs++;
>      br->br_prod_bytes += nbytes;
>      critical_exit();
>
> That is, there is no memory barrier to enforce completion of the
> load/increment/store/load/load/addition/store operations before
> updating what other thread spin on.

The counters are 64 bits, so it also does non-atomic increments of them
no 32-bit arches.

> Even if `br_prod_tail' is marked `volatile', there is no guarantee that it will not be re-ordered wrt. non-volatile write (to `br_prod_bufs' and `br_prod_bytes').

Using volatile is generally bogus.  Here it seems to mainly give
pessimizations and more opportunities for bad memory orders.  The i386
code for incrementing a 64-bit volatile x is:

 	movl	x, %eax
 	movl	x+4, %edx
 	addl	$1, %eax
 	adcl	$0, %edx
 	movl	%eax, x
 	movl	%edx, x+4

while for a 64-bit non-volatile it is:

 	addl	$1, x
 	adcl	$0, x+4

so volatile gives more caching in registers instead of less.  The following
are some of the bad memory orders possible:

with volatile:
        lo = br->br_prod_bytes.lo;
        hi = br->br_prod_bytes.hi;
        br->br_prod_tail = prod_next;
        br->br_prod_bufs++;
        lo += nbytes;
        hi += carry;
        br->br_prod_bytes.hi = hi;
        br->br_prod_bytes.lo = lo;

without volatile:

        br->br_prod_bytes.lo += nbytes;
        br->br_prod_tail = prod_next;
        br->br_prod_bufs++;
        br->br_prod_bytes.hi += carry;

I think the token method would make the nonatomic accesses to the
counters sufficiently atomic if it worked.  The necessary memory
barriers would probably have a memory clobber which effectively makes
all memory variables transiently volatile (where volatile actually
means non-volatile with respect to them changing -- holding the token
prevents them changing -- but actually means volatile with respect to
their memory accesses) so the effect of declaring the counters permanently
volatile would be reduced to a pessimization.  Even reads of them in
sysctls must hold the token to get a consistent snapshot.

Bruce


More information about the freebsd-bugs mailing list