svn commit: r252032 - head/sys/amd64/include

Mon Jun 24 14:22:01 UTC 2013

[snipping everything about counter64, atomic ops, cycles, etc.]

I wonder if the idea explained in this paper:

http://static.usenix.org/event/usenix03/tech/freenix03/full_papers/mcgarry/mcgarry_html/

Which seems to be used in FreeBSD for some ARM atomics:

http://svnweb.freebsd.org/base/head/sys/arm/include/atomic.h?view=annotate
, look for ARM_RAS_START

would be more efficient.

To summarize: one marks a section of code such that if a thread is
interrupted during the code it restarts at the beginning instead of where
it was interrupted.  This has been used to implement atomic increment on
some hardware without the necessary instructions.  Here it could be used to
implement atomic increment on the per-cpu counter without the overhead of
an atomic instruction.

It's multiple stores to mark the section of code doing the increment, but
they're all per-cpu or per thread.  That may be cheaper than an atomic
increment, at least on 32-bit platforms that are doing an atomic 64-bit
increment.

I haven't benchmarked this (ENOTIME, plus I'm on vacation right now), but
using restartable sections requires three stores (add an item to a linked
list, 64-bit increment for the counter, remove an item from a linked list).
 Some of the penalty is payed in the context switch code, which has to
check if the instruction pointer is in one of these critical sections.  I
haven't checked to see if this code is enabled on all architectures or just
ARM.  But if context switches are less frequent than counter increments in
the networking code, it's still a win for most uses.

Thanks,
matthew