8.0RC2 "top" statistics seem broken

Thu Nov 12 19:50:23 UTC 2009

On Thu, Nov 12, 2009 at 08:42:28AM -0800, Matthew Fleming wrote:

> > > [snip]
> > > 
> > > Load average and %CPU user are right, as are other global
> statistics.
> > > The load is produced by the "7z" process (archivers/p7zip) which
> > > compresses some data in two threads but is credited with 0% CPU,
> though
> > > its runtime is correct (increments every second as it should in a
> > > CPU-bound process). It doesn't help if I expand / show individual
> > threads.
> > 
> > I believe this is related to multithreaded processes only. I saw this
> for
> > intr kernel process. Singlethread processes eat CPU slightly less than
> > on 7.2, however, I can not say is it statistic errors or real speedup.
> > I saw the issue on SMP/ULE only and can not say anything about UP and
> > 4BSD scheduler.
> 
> Check out r197652 on stable/7.  I had a similar problem where top was
> showing 0% for a CPU hog, but since I was unable to replicate it on
> CURRENT (and the ULE accounting code is different between releases) I
> only submitted for stable/7.  I think the patch will be easy to apply by
> hand, though, to test it.

Thank you very much. I have applied your patch and it fixes the bug:

CPU 0: 22.0% user,  0.0% nice,  4.9% system,  0.0% interrupt, 73.2% idle
CPU 1:  1.2% user,  0.0% nice,  1.2% system,  4.9% interrupt, 92.7% idle

  PID USERNAME     THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   11 root           2 171 ki31     0K    32K CPU0    0  12:11 165.77% idle
 1338 nobody         1  44  -10   439M   433M kqread  0   0:24 14.45% nginx
 1339 nobody         1  44  -10   439M   433M kqread  0   0:23 12.89% nginx
   12 root          15 -60    -     0K   240K WAIT    0   0:09  4.39% intr


CPU 0: 16.2% user,  0.0% nice,  8.5% system,  0.8% interrupt, 74.5% idle
CPU 1:  1.2% user,  0.0% nice,  1.9% system,  4.2% interrupt, 92.7% idle

  PID USERNAME    PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   11 root        171 ki31     0K    32K RUN     1   6:39 88.96% {idle: cpu1}
   11 root        171 ki31     0K    32K RUN     0   6:11 77.59% {idle: cpu0}
 1338 nobody       44  -10   439M   433M CPU0    0   0:27 14.45% nginx
 1339 nobody       44  -10   439M   433M RUN     1   0:26 14.26% nginx
   12 root        -68    -     0K   240K WAIT    1   0:09  4.69% {irq19: bge0}

The patch against 8.0-PREREALSE is attached.


-- 
Igor Sysoev
http://sysoev.ru/en/
-------------- next part --------------

--- sys/kern/sched_ule.c	2009-11-02 09:25:28.000000000 +0300
+++ sys/kern/sched_ule.c	2009-11-12 21:53:45.000000000 +0300
@@ -103,6 +103,7 @@
 	u_int		ts_slptime;	/* Number of ticks we vol. slept */
 	u_int		ts_runtime;	/* Number of ticks we were running */
 	int		ts_ltick;	/* Last tick that we were running on */
+	int		ts_incrtick;	/* Last tick that we incremented on */
 	int		ts_ftick;	/* First tick that we were running on */
 	int		ts_ticks;	/* Tick count */
 #ifdef KTR
@@ -1991,6 +1992,7 @@
 	 */
 	ts2->ts_ticks = ts->ts_ticks;
 	ts2->ts_ltick = ts->ts_ltick;
+	ts2->ts_incrtick = ts->ts_incrtick;
 	ts2->ts_ftick = ts->ts_ftick;
 	child->td_user_pri = td->td_user_pri;
 	child->td_base_user_pri = td->td_base_user_pri;
@@ -2182,11 +2184,12 @@
 	 * Ticks is updated asynchronously on a single cpu.  Check here to
 	 * avoid incrementing ts_ticks multiple times in a single tick.
 	 */
-	if (ts->ts_ltick == ticks)
+	if (ts->ts_incrtick == ticks)
 		return;
 	/* Adjust ticks for pctcpu */
 	ts->ts_ticks += 1 << SCHED_TICK_SHIFT;
 	ts->ts_ltick = ticks;
+	ts->ts_incrtick = ticks;
 	/*
 	 * Update if we've exceeded our desired tick threshhold by over one
 	 * second.