stopped processes using cpu?

Wed Aug 20 16:00:44 UTC 2014

On Wednesday, August 20, 2014 9:17:06 am Ian Lepore wrote:
> On Tue, 2014-08-19 at 18:45 -0700, Tim Kientzle wrote:
> > On Aug 19, 2014, at 12:28 PM, Allan Jude <allanjude at freebsd.org> wrote:
> > 
> > > On 2014-08-19 15:21, Dieter BSD wrote:
> > >> 8.2 on amd64
> > >> Top(1) with no arguments reports that some firefox processes are using 
cpu
> > >> dispite being stopped (via kill -stop pid) for at least several hours.
> > >> Adding -C doesn't change the numbers.  Ps(1) reports the same.
> > >> Interestingly, a firefox that isn't stopped is (correctly?) reported as
> > >> using 0 cpu.  The 100% idle should be correct, but who knows.
> > >> 
> > >> last pid: 51932;  load averages:  0.07, 0.99, 1.42 up 14+19:02:56  
08:48:28
> > >> 267 processes: 1 running, 138 sleeping, 128 stopped
> > >> CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> > >> Mem: 1665M Active, 653M Inact, 240M Wired, 95M Cache, 372M Buf, 815M 
Free
> > >> Swap: 8965M Total, 560K Used, 8965M Free
> > >> 
> > >>  PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
> > >> 44188 a           9  44    0   303M   187M STOP   113:19 13.43% 
firefox-bin
> > >> 92986 b          11  44    0   164M 62848K STOP     0:18  5.03% 
firefox-bin
> > >> 16507 c          11  44    0   189M 88976K STOP     0:13  0.24% 
firefox-bin
> > >> 2265 root        1  44    0   248M   193M select 625:38  0.00% Xorg
> > >> 51271 d          10  44    0   233M   128M ucond   12:12  0.00% 
firefox-bin
> > >> _______________________________________________
> > >> freebsd-hackers at freebsd.org mailing list
> > >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> > >> To unsubscribe, send any mail to "freebsd-hackers-
unsubscribe at freebsd.org"
> > >> 
> > > 
> > > I wonder if jhb@'s new top code solves this. He adjusted the way CPU
> > > usage is tracked to be more responsive, and not based on averages
> > 
> > I wonder if jhb@’s new top code fixes the whacky WCPU values we’ve been 
seeing on FreeBSD/ARM.  (1713% CPU is a little hard to believe on a single-
core board ;-).
> > 
> > Tim
> > 
> 
> *Fixes* it?  I've been under the impression those changes caused it.  I
> certainly never saw 1000%+ numbers in top until very recently.

Yes, if it's a recent change then mine are to blame.  In both cases the 
numbers are imprecise.  The older code still in stable@ (as in the OP),
takes a long time to ramp up and down.  So in this case the processes are
stopped (no, there's no rootkit), but the scheduler takes a long time to
factor that into its decayed %CPU computation.

In the "new" code, the problem is that fetching the kinfo_proc and the
current timestamp for that kinfo_proc is not atomic.  I have thought
about "fixing" that by embedding a new timeval in kinfo_proc that is
stamped with the time the individual kinfo_proc is generated.  This would
(I believe) alleviate the noise in the new code as the delta in walltime
at the "bottom" of the ratio would then correspond to the delta in runtime
on the "top".

However, trying to store a timeval in kinfo_proc is quite tricky as all the
available fields are things like ints and longs.  I could perhaps split it
up into two longs which is kind of fugly.  Another option would be to just 
generate a single long that holds raw nanoseconds uptime and store that
(wrapping would be ok since I would only care about deltas).

-- 
John Baldwin