kern/160992: buf_ring(9) statistics accounting not MPSAFE
Bruce Evans
brde at optusnet.com.au
Sun Sep 25 03:10:14 UTC 2011
The following reply was made to PR kern/160992; it has been noted by GNATS.
From: Bruce Evans <brde at optusnet.com.au>
To: Arnaud Lacombe <lacombar at gmail.com>
Cc: freebsd-gnats-submit at freebsd.org, freebsd-bugs at freebsd.org
Subject: Re: kern/160992: buf_ring(9) statistics accounting not MPSAFE
Date: Sun, 25 Sep 2011 13:01:04 +1000 (EST)
On Sat, 24 Sep 2011, Arnaud Lacombe wrote:
>> Description:
> The following block of code, in `sys/sys/buf_ring.h':
>
>
> /*
> * If there are other enqueues in progress
> * that preceeded us, we need to wait for them
> * to complete
> */
> while (br->br_prod_tail != prod_head)
> cpu_spinwait();
> br->br_prod_bufs++;
> br->br_prod_bytes += nbytes;
> br->br_prod_tail = prod_next;
> critical_exit();
>
>
> can be seen at runtime, memory-wise as:
>
> while (br->br_prod_tail != prod_head)
> cpu_spinwait();
> br->br_prod_tail = prod_next;
> br->br_prod_bufs++;
> br->br_prod_bytes += nbytes;
> critical_exit();
>
> That is, there is no memory barrier to enforce completion of the
> load/increment/store/load/load/addition/store operations before
> updating what other thread spin on.
The counters are 64 bits, so it also does non-atomic increments of them
no 32-bit arches.
> Even if `br_prod_tail' is marked `volatile', there is no guarantee that it will not be re-ordered wrt. non-volatile write (to `br_prod_bufs' and `br_prod_bytes').
Using volatile is generally bogus. Here it seems to mainly give
pessimizations and more opportunities for bad memory orders. The i386
code for incrementing a 64-bit volatile x is:
movl x, %eax
movl x+4, %edx
addl $1, %eax
adcl $0, %edx
movl %eax, x
movl %edx, x+4
while for a 64-bit non-volatile it is:
addl $1, x
adcl $0, x+4
so volatile gives more caching in registers instead of less. The following
are some of the bad memory orders possible:
with volatile:
lo = br->br_prod_bytes.lo;
hi = br->br_prod_bytes.hi;
br->br_prod_tail = prod_next;
br->br_prod_bufs++;
lo += nbytes;
hi += carry;
br->br_prod_bytes.hi = hi;
br->br_prod_bytes.lo = lo;
without volatile:
br->br_prod_bytes.lo += nbytes;
br->br_prod_tail = prod_next;
br->br_prod_bufs++;
br->br_prod_bytes.hi += carry;
I think the token method would make the nonatomic accesses to the
counters sufficiently atomic if it worked. The necessary memory
barriers would probably have a memory clobber which effectively makes
all memory variables transiently volatile (where volatile actually
means non-volatile with respect to them changing -- holding the token
prevents them changing -- but actually means volatile with respect to
their memory accesses) so the effect of declaring the counters permanently
volatile would be reduced to a pessimization. Even reads of them in
sysctls must hold the token to get a consistent snapshot.
Bruce
More information about the freebsd-bugs
mailing list