sched_lock && thread_lock()

Mon May 21 10:43:20 UTC 2007

On Mon, 21 May 2007, Jeff Roberson wrote:

> On Mon, 21 May 2007, Attilio Rao wrote:
>>> Sorry, but I strongly disagree.
>> 
>> Ah, and about consistency of functions your previously described, I assume 
>> nothing vital is linked to it.
>> vmmeter is just a statistics collector and nothing else, so I don't expect 
>> nothing critical/vital happens from its fields (I'm sure a lot of variables 
>> are just bumped up and never decreased, for example). If that really 
>> happens we should fix that behaviour really, instead than making things a 
>> lot heavier.

The function that I described is only linked to vm paging.  Statistics are
vital for vm paging, though perfectly up to date statistics might not be.

> Well, Attilio is right that in most cases using a lock to save a few atomics 
> is going to be more expensive.  There is also the procedural cost of the lock 
> and the cache miss etc.  However, in some cases there is already a lock 
> available that is protecting the counter.

Also, atomic ops don't help much for VMCNT_GET() and VMCNT_SET():
VMCNT_GET(): atomic ops are not used.  If they were used, then they would
     just reduce race windows, since without a lock the data may change
     immediately after you read it, whether or not you read it atomically.
     Atomic ops are still strictly needed to give atomic accesses to
     variables that cannot naturally be accessed atomically (perhaps because
     they are wider than the bus width), but everything in struct vmmeter
     is u_int so this is only a problem on exotic hardware.
VMCNT_SET(): atomic ops are used.  This doesn't help, since with multiple
     writers, without locks there would be races deciding what to write
     (starting from unlocked values) and then races writing.  I think this
     isn't a problem because VMCNT_SET() is only used in initialization code
     when only one thread is running.

> Furthermore, there are a few cases, most notably paging targets, where code 
> depends on the value of the counters.  For most fields, I believe we have a

Not counting sysctls, these are most cases by static count of the number of
accesses.  Probably by dynamic count too (since changes to the page counts
are much more common than things like forks).  Probably not by static count
of the number of accesses (agin not counting sysctls) since there are a
lot of fields and most aren't used for paging.

> good approach, however, there are a few that could be minorly improved.  The 
> question is whether it's worth inconsistently accessing the counters to save 
> a few atomics which likely have an immeasurable performance impact.

We already have inconsistent accesses via PCPU_LAZY_INC(), and I think
all cases where you don't really care about the values can be handled
better by PCPU_LAZY_INC().  (Though I don't like PCPU_LAZY_INC(), since
it gives namespace pollution, a larger race window while combining the
values, complications, and missing support for combining the values
in dead kernels.)  Cases that can be handled by PCPU_LAZY_INC() are
distinguished by the variables being monotonically increasing with
time (until wrapping at UINT_MAX breaks them) and nothing except
userland looking at them.  The userland accesses are full of races,
but userland doesn't really care.  If accessing the pcpu is inefficient,
then PCPU_LAZY_INC() can always use atomic ops if that is less inefficient.

Bruce