Updated rusage patch

Fri Jun 1 08:57:29 UTC 2007

On Fri, 1 Jun 2007, Bruce Evans wrote:

>> The extra newline?  Isn't that a style violation?
>
> This is arguable.  Although strict KNF may require no extra blank lines
> (indent -sob), and this rule is not too bad for block comments because
> the line with the "/*" provides some separation, there are many examples
> of old code in kern, including many in this file, where a blank line
> is left before the block comment.  In the 4.4Lite2 kern_exit.c, the
> count is approx. 10:8 in favour of no extra blank line before block
> comments within functions (not counting cases where there clearly
> shouldn't be one, e.g., after #ifdef).  In the current kern_exit.c
> after this change was committed, the count is 32:13 in favour of a
> blank line (down from 33:12 before this change).
>
> My version of style.9 requires leaving a blank line before comments
> and not leaving a blank line before statements (if you want to leave
> a blank line between statements to form separate blocks of statements,
> there must be a comment at the start of each new block).

I see, in that case, I'll follow this rule in the future.  It seems 
sensible to me.

>>> These bugs can probably be fixed by moving the copying to the final
>>> thread_exit().  Maybe acct_process() can be moved there too, but I
>>> suspect too much of the process has gone away by then for acct_process()
>>> to work.
>> 
>> I tried moving the rusage summing and assignment to p_ru to later in this 
>> function.  This lead to a race which produced "runtime went backwards" 
>> printfs because runtime became 0.  I didn't quite understand this as we 
>> check for PRS_ZOMBIE in wait, but I also didn't investigate terribly far.
>
> Some technical problem.

Well, I think the whole exit/wait path could probably use some attention. 
There is an incredible amount of locking and unlocking to deal with 
various races and lock order issues.  And there are many subtle effects 
which aren't immediately obvious.  I'm nervous about how many bugs might 
be caused if we start going down this path so close to 7.0.

>
>> I believe the sched lock locking is necessary for rux.  For rusage, 
>> however, it's only ever modified by curthread.  The only races that are 
>> present when reading without locks are potentially inconsistent rusage 
>> where the stats are not all from a single snapshot of the program. However, 
>> the stats can obviously change before the copy makes it back to user-space 
>> anyhow.  I don't see a real race that needs protecting.
>
> No, at least the ru_*rss fields are updated by statclock().  These
> seem to be the only problematic fields.  They are unimportant.  However,
> they should be easy to protect correctly using the same lock as the
> td_*ticks fields.  They are still protected by sched_lock in statclock()
> and rufetch() but not here.
>
> Hmm, this is confusing.  Normal locking is not used for thread-local
> fields.  Instead, a side effect of spinlocking is used:
> mtx_lock_spin(&any) in non-interrupt code has the side effect of
> masking interrupts on the current CPU, so statclock() can access
> anything on the current CPU without acquiring any locks, just like
> an interrupt handler on a UP system.  This is used for td_*ticks.
> It doesn't work for ru_*rss since since exit1() doesn't hold a
> spinlock when copying td_ru.  The sched_locking of ru_*rss in
> statclock() doesn't help -- I think it now does nothing except
> waste time, since these fields are now thread-local so they need
> the same locking as td_*ticks, which is null in statclock() but
> non-null elsewhere.

Yes, you're right.  I'll move the copying under the scheduler lock in 
exit1() so we can't lose any of the rss information.

>
> Related bugs:
> - td_[usip]ticks are still under (j) (sched_lock) in proc.h.
> - td_(uu,us}ticks have always (?) been under (k) (thread-local).  That
>  is more correct than (j), but they are updated by an interrupt handler
>  and seem to be accessed without disabling interrupts elsewhere.

Well td_[uisp]ticks are set and cleared while the sched_lock is held.  In 
threadlock.diff the thread lock is responsible for this.  That reminds me 
that I didn't add the per-thread locking to rufetch() in the patch I 
posted earlier.

Thanks,
Jeff

>
>> This is why I believe the code in exit1() is also safe
>
> Modification by only curthread isn't quite enough -- see above.
>
>> although technically it could lose some information on the way to 
>> thread_exit(). To fix this we'd have to do the final accumulation under 
>> sched_lock the last time we lock it. However there are other races in the 
>> proc code there that were not immediately obvious.  Also, doing the 
>> accumulation here negates the possibility of running process accounting 
>> after p_ru is valid, which it is not doing now.
>
> I see.
>
>> Related to rusage but unrelated to my patch;  Why are irxss, irdss, and 
>> irsss expressed in units of bytes*ticks of execution when rusage doesn't 
>> report the ticks of execution and rather a timeval?  Are consumers expected 
>> to sum the timevals and then extrapolate?
>
> I don't really know, but guess this is historical.  Ticks can be managed
> more efficiently than timevals.  There are remarkably few consumers
> - mainly time(1), csh and window(1) in /usr/src.  csh still seems to hard-
>  code the tick freqency as 100.  This would make all rss values off
>  by a factor of stathz/100.  time(1) hard-coded the frequency as 100
>  in FreeBSD-1.1.
>
> Bruce
>