[PATCH] RUSAGE_THREAD

Thu May 6 13:47:27 UTC 2010

On Thu, May 06, 2010 at 04:42:39PM +0400, Alexander Krizhanovsky wrote:
> On 05/05/10 01:24, Kostik Belousov wrote:
> > On Wed, May 05, 2010 at 12:48:53AM +0000, Alexander Krizhanovsky wrote:
> >   
> >> Konstantin,
> >>
> >> Concerning i/o counters we collect them in rucollect() in for loop and 
> >> update in various places, for example in vfs_bio.c. Rusage of an exiting 
> >> threads is handled in exit1() -> ruadd().
> >>
> >> Your version of the patch certainly is more robust, however I'm still 
> >> concerning about following things, which could be done better:
> >>
> >> 1) Zeroing of thread runtime statistic looks fine if we use it only for 
> >> whole process statistic calculating, but now (when it's also used as is 
> >> for the thread statistic) it should be handled independently, i.e. 
> >> RUSAGE_{SELF,CHILDREN} calls should not affect the thread structures 
> >> somehow. So I suppose we need to introduce some new counters to proc 
> >> structure and counters update code (it was stopped me to go on this way).
> >>     
> > What do you mean by zeroing of thread runtime statistic ? Can you, please,
> > point me to exact location in code ? I did not found such code when
> > I did initial review of your patch.
> >
> > I did testing by repeatedly calling getrusage with alternated
> > RUSAGE_SELF/RUSAGE_THREAD commands, and got sane increasing snapshots
> > of statistic.
> >   
> 
> I meant zeroing of per-thread statistic in ruxagg() after updating of
> the whole process time counters.
I still do not understand which code you are referencing. Please note that
ruxagg() does not clear any thread statistic. It does clear the accumulated
tick counts for the thread after moving them _both_ to per-process and
per-thread resource usage storage.

Please see the HEAD code that contains the committed patch.
> 
> 
> >   
> >> 2) With first in mind, getruasge(RUSAGE_THREAD) is called in current 
> >> thread and doesn't affect or use information from other threads, so it 
> >> definitely should use less number of locks, may be with atomic 
> >> operations for read-update-write operations. In fact the same getting 
> >> per-thread statistic in Linux is done _without_ locks at all (however 
> >> Linux uses different process/thread model).
> >>     
> > thread_lock() is spinlock, and it disables preemption. calcru1() is
> > very sensible to have ticks counters to be in consistent state.
> > You can look at kern_time.c implementation of CLOCK_THREAD_CPUTIME_ID,
> > where indeed only preepmtion is disabled by use of critical section.
> >
> > On the other hand, td_rux is accessed by other threads, and caclru1()
> > updates should be properly syncronized. Since thread_lock would
> > be needed for this, and it would give slightly more consistent results
> > for the copy of td_ru, I used it.
> >
> > I do not think that thread_lock for running thread is a bottleneck,
> > and getrusage definitely should be not a contention point for properly
> > written application.
> >
> >   
> 
> Yes, you're right - if we wish to have lockless thread accounting, then
> we need to redesign the whole rusage accounting, so I'm voting for
> commit of the patch.
Patch was committed two days ago.

> 
> Indeed, I concerned more about PROC_LOCK/PROC_SLOCK - they are acquired
> both in kern_getrusage() and could be a problem in multithreaded process
> with intensive performance accounting in threads. This happens for
> example in resource accounting systems for shared hosting which are
> solved by custom extensions of Apache, MySQL and other servers.

We still need per-process lock for resource summation code. On the other
hand, I do not understand why p_rux have to be protected by spinlock and
not by process mutex. Hm, spinlock disables preemption, but only for
curthread, Process spinlock is not acquired in the mi_switch(), so I
really do not see a reason not to change protection to mutex.

Any idea ?
> 
> 
> >> If we're in time and it really looks like a problem (is getrusage() ever 
> >> a hotspot?) I can try to prepare the patch with less locking on this 
> >> weekend.
> >>
> >> Also I still don't understand the sanity check in calccru1() for 
> >> negativeness of tu ( (int64_t)tu < 0 ) - is it reachable? It seems that 
> >> it is possible only after about 300 thousand years of uptime... Could 
> >> you please explain this?
> >>     
> > I never saw this message, may be change it to assertion, as proposed
> > in phk comment, is reasonable. I do not have an opinion.
> >
> >   
> 
> I asked about it in my message from 3rd of May and also there I've
> proposesed following patch for calcru1():
> 
> -       if ((int64_t)tu < 0) {
> -               /* XXX: this should be an assert /phk */
> -               printf("calcru: negative runtime of %jd usec for pid %d
> (%s)\n",
> -                   (intmax_t)tu, p->p_pid, p->p_comm);
> -               tu = ruxp->rux_tu;
> -       }
> +       KASSERT((int64_t)tu < 0, ("calcru: negative runtime of %jd usec "
> +                               "for pid %d (%s)\n",
> +                               (intmax_t)tu, p->p_pid, p->p_comm));
> 
I did understand your proposal, but, as I said, I have no opinion.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20100506/a327a805/attachment.pgp