[PATCH] RUSAGE_THREAD

Alexander Krizhanovsky ak at natsys-lab.com
Thu May 6 08:44:27 UTC 2010


On 05/05/10 01:24, Kostik Belousov wrote:
> On Wed, May 05, 2010 at 12:48:53AM +0000, Alexander Krizhanovsky wrote:
>   
>> Konstantin,
>>
>> Concerning i/o counters we collect them in rucollect() in for loop and 
>> update in various places, for example in vfs_bio.c. Rusage of an exiting 
>> threads is handled in exit1() -> ruadd().
>>
>> Your version of the patch certainly is more robust, however I'm still 
>> concerning about following things, which could be done better:
>>
>> 1) Zeroing of thread runtime statistic looks fine if we use it only for 
>> whole process statistic calculating, but now (when it's also used as is 
>> for the thread statistic) it should be handled independently, i.e. 
>> RUSAGE_{SELF,CHILDREN} calls should not affect the thread structures 
>> somehow. So I suppose we need to introduce some new counters to proc 
>> structure and counters update code (it was stopped me to go on this way).
>>     
> What do you mean by zeroing of thread runtime statistic ? Can you, please,
> point me to exact location in code ? I did not found such code when
> I did initial review of your patch.
>
> I did testing by repeatedly calling getrusage with alternated
> RUSAGE_SELF/RUSAGE_THREAD commands, and got sane increasing snapshots
> of statistic.
>   

I meant zeroing of per-thread statistic in ruxagg() after updating of
the whole process time counters.


>   
>> 2) With first in mind, getruasge(RUSAGE_THREAD) is called in current 
>> thread and doesn't affect or use information from other threads, so it 
>> definitely should use less number of locks, may be with atomic 
>> operations for read-update-write operations. In fact the same getting 
>> per-thread statistic in Linux is done _without_ locks at all (however 
>> Linux uses different process/thread model).
>>     
> thread_lock() is spinlock, and it disables preemption. calcru1() is
> very sensible to have ticks counters to be in consistent state.
> You can look at kern_time.c implementation of CLOCK_THREAD_CPUTIME_ID,
> where indeed only preepmtion is disabled by use of critical section.
>
> On the other hand, td_rux is accessed by other threads, and caclru1()
> updates should be properly syncronized. Since thread_lock would
> be needed for this, and it would give slightly more consistent results
> for the copy of td_ru, I used it.
>
> I do not think that thread_lock for running thread is a bottleneck,
> and getrusage definitely should be not a contention point for properly
> written application.
>
>   

Yes, you're right - if we wish to have lockless thread accounting, then
we need to redesign the whole rusage accounting, so I'm voting for
commit of the patch.

Indeed, I concerned more about PROC_LOCK/PROC_SLOCK - they are acquired
both in kern_getrusage() and could be a problem in multithreaded process
with intensive performance accounting in threads. This happens for
example in resource accounting systems for shared hosting which are
solved by custom extensions of Apache, MySQL and other servers.


>> If we're in time and it really looks like a problem (is getrusage() ever 
>> a hotspot?) I can try to prepare the patch with less locking on this 
>> weekend.
>>
>> Also I still don't understand the sanity check in calccru1() for 
>> negativeness of tu ( (int64_t)tu < 0 ) - is it reachable? It seems that 
>> it is possible only after about 300 thousand years of uptime... Could 
>> you please explain this?
>>     
> I never saw this message, may be change it to assertion, as proposed
> in phk comment, is reasonable. I do not have an opinion.
>
>   

I asked about it in my message from 3rd of May and also there I've
proposesed following patch for calcru1():

-       if ((int64_t)tu < 0) {
-               /* XXX: this should be an assert /phk */
-               printf("calcru: negative runtime of %jd usec for pid %d
(%s)\n",
-                   (intmax_t)tu, p->p_pid, p->p_comm);
-               tu = ruxp->rux_tu;
-       }
+       KASSERT((int64_t)tu < 0, ("calcru: negative runtime of %jd usec "
+                               "for pid %d (%s)\n",
+                               (intmax_t)tu, p->p_pid, p->p_comm));




-- 
Alexander Krizhanovsky
NatSys Lab. (http://natsys-lab.com)
tel: +7 (916) 717-3899, +7 (499) 747-6304
e-mail: ak at natsys-lab.com



More information about the freebsd-hackers mailing list