stopped processes using cpu?

Wed Sep 3 20:06:27 UTC 2014

On Wednesday, August 20, 2014 11:38:40 AM John Baldwin wrote:
> On Wednesday, August 20, 2014 9:17:06 am Ian Lepore wrote:
> > On Tue, 2014-08-19 at 18:45 -0700, Tim Kientzle wrote:
> > > On Aug 19, 2014, at 12:28 PM, Allan Jude <allanjude at freebsd.org> wrote:
> > > > On 2014-08-19 15:21, Dieter BSD wrote:
> > > >> 8.2 on amd64
> > > >> Top(1) with no arguments reports that some firefox processes are
> > > >> using
> 
> cpu
> 
> > > >> dispite being stopped (via kill -stop pid) for at least several
> > > >> hours.
> > > >> Adding -C doesn't change the numbers.  Ps(1) reports the same.
> > > >> Interestingly, a firefox that isn't stopped is (correctly?) reported
> > > >> as
> > > >> using 0 cpu.  The 100% idle should be correct, but who knows.
> > > >> 
> > > >> last pid: 51932;  load averages:  0.07, 0.99, 1.42 up 14+19:02:56
> 
> 08:48:28
> 
> > > >> 267 processes: 1 running, 138 sleeping, 128 stopped
> > > >> CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100%
> > > >> idle
> > > >> Mem: 1665M Active, 653M Inact, 240M Wired, 95M Cache, 372M Buf, 815M
> 
> Free
> 
> > > >> Swap: 8965M Total, 560K Used, 8965M Free
> > > >> 
> > > >>  PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU
> > > >>  COMMAND
> > > >> 
> > > >> 44188 a           9  44    0   303M   187M STOP   113:19 13.43%
> 
> firefox-bin
> 
> > > >> 92986 b          11  44    0   164M 62848K STOP     0:18  5.03%
> 
> firefox-bin
> 
> > > >> 16507 c          11  44    0   189M 88976K STOP     0:13  0.24%
> 
> firefox-bin
> 
> > > >> 2265 root        1  44    0   248M   193M select 625:38  0.00% Xorg
> > > >> 51271 d          10  44    0   233M   128M ucond   12:12  0.00%
> 
> firefox-bin
> 
> > > >> _______________________________________________
> > > >> freebsd-hackers at freebsd.org mailing list
> > > >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> > > >> To unsubscribe, send any mail to "freebsd-hackers-
> 
> unsubscribe at freebsd.org"
> 
> > > > I wonder if jhb@'s new top code solves this. He adjusted the way CPU
> > > > usage is tracked to be more responsive, and not based on averages
> > > 
> > > I wonder if jhb@’s new top code fixes the whacky WCPU values we’ve been
> 
> seeing on FreeBSD/ARM.  (1713% CPU is a little hard to believe on a single-
> core board ;-).
> 
> > > Tim
> > 
> > *Fixes* it?  I've been under the impression those changes caused it.  I
> > certainly never saw 1000%+ numbers in top until very recently.
> 
> Yes, if it's a recent change then mine are to blame.  In both cases the
> numbers are imprecise.  The older code still in stable@ (as in the OP),
> takes a long time to ramp up and down.  So in this case the processes are
> stopped (no, there's no rootkit), but the scheduler takes a long time to
> factor that into its decayed %CPU computation.
> 
> In the "new" code, the problem is that fetching the kinfo_proc and the
> current timestamp for that kinfo_proc is not atomic.  I have thought
> about "fixing" that by embedding a new timeval in kinfo_proc that is
> stamped with the time the individual kinfo_proc is generated.  This would
> (I believe) alleviate the noise in the new code as the delta in walltime
> at the "bottom" of the ratio would then correspond to the delta in runtime
> on the "top".
> 
> However, trying to store a timeval in kinfo_proc is quite tricky as all the
> available fields are things like ints and longs.  I could perhaps split it
> up into two longs which is kind of fugly.  Another option would be to just
> generate a single long that holds raw nanoseconds uptime and store that
> (wrapping would be ok since I would only care about deltas).

So I tried this and the results aren't a lot better.  I think the problem now 
is that rufetch() doesn't force an update of the target thread's stats to
"now" (the way getrusage() does for curthread).  Because the idle thread runs
constantly when idle, it is especially prone to this imprecision.  I'm not 
sure of a good way to fix this.  Having a per-thread timestamp that was 
updated each time the runtime was updated would help for a currently-running
thread perhaps.  Another option would be to use an IPI (ewww) to force
currently running threads to update their runtime when the sysctl runs.  That
seems a bit expensive though.  (I might at least try it to see if it does
resolve it to verify my understanding of the issue.)

-- 
John Baldwin