ps "time" field jumps backward

Sun Feb 7 16:43:43 UTC 2010

On Fri, Feb 05, 2010 at 05:45:01PM -0500, Linda Messerschmidt wrote:
> On Fri, Feb 5, 2010 at 4:28 PM, Dan Nelson <dnelson at allantgroup.com> wrote:
> > Ideally, top and ps would total up all
> > the per-thread CPU counts when displaying the per-process numbers, but it
> > doesn't seem to.

> It does seem to total them:

> $ ps axHo pid,lwp,time,wchan,comm | awk '$1 == 1647'
>  1647 100401   0:00.63 select mysqld
>  1647 100466   0:11.08 sigwai mysqld
>  1647 100521   0:00.00 ucond  mysqld
> $ ps axo pid,lwp,time,wchan,comm | awk '$1 == 1647'
>  1647 100521   0:11.71 ucond  mysqld

> But you put me on the right track.  I ran both side by side for
> awhile, and found that ps/top only sums up those threads that haven't
> already exited.  I.e., once a thread has exited, it's as if its usage
> never happened from the perspective of ps and top's total calculation.

> That seems like undesirable behavior, particularly if it conceals
> CPU-churning behavior by short-lived threads, but possibly very hard
> to change. :(

> I wonder if the system accounting records are more accurate?

It should not be "very hard" to fix the time field, because the rusage
struct is correct. This can be seen in the ^T status line and, after the
process has terminated, time, times, ps S.

In the kernel code, sys/kern/kern_proc.c, it seems that
fill_kinfo_proc_only() puts in the correct ki_runtime using
p->p_rux.rux_runtime, but fill_kinfo_aggregate() later overwrites this
using the sum of all threads' td->td_runtime. Removing the bogus
calculation from fill_kinfo_aggregate() fixes ps(1)'s time field, but
not the %cpu field, nor anything in top(1). The latter is because top(1)
always requests information about threads and does the same wrong
calculation as fill_kinfo_aggregate() internally.

Fixing the %cpu field needs changes to the scheduler. The schedulers
have functions to propagate CPU usage from terminating child processes
and threads, but this seems to affect scheduling decisions only and not
the %cpu field. Note that the CPU usage is always charged to the most
recently created thread in the process, not necessarily the thread that
called or will call fork(), pthread_create(), waitpid() or
pthread_join().

If the thread charged to could be selected better, it could be useful to
add in the cpu time as well.

Index: sys/kern/kern_proc.c
===================================================================

--- sys/kern/kern_proc.c	(revision 203549)
+++ sys/kern/kern_proc.c	(working copy)
@@ -676,11 +676,9 @@
 
 	kp->ki_estcpu = 0;
 	kp->ki_pctcpu = 0;
-	kp->ki_runtime = 0;
 	FOREACH_THREAD_IN_PROC(p, td) {
 		thread_lock(td);
 		kp->ki_pctcpu += sched_pctcpu(td);
-		kp->ki_runtime += cputick2usec(td->td_runtime);
 		kp->ki_estcpu += td->td_estcpu;
 		thread_unlock(td);
 	}

Test program that starts threads that waste about 1 second of cpu time
each.

#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <err.h>

void *
threadfunc(void *unused)
{
	int error;

	error = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
	if (error)
		errc(1, error, "pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS)");
	for (;;)
		;
}

int
main(int argc, char *argv[])
{
	int error;
	pthread_t td;

	for (;;)
	{
		error = pthread_create(&td, NULL, threadfunc, NULL);
		if (error != 0)
			errc(1, error, "pthread_create");
		sleep(1);
		error = pthread_cancel(td);
		if (error != 0)
			errc(1, error, "pthread_cancel");
		error = pthread_join(td, NULL);
		if (error != 0)
			errc(1, error, "pthread_join");
	}

	return 0;
}

-- 
Jilles Tjoelker