Consistently "high" CPU load on 10.0-STABLE

Sun Jul 20 19:09:58 UTC 2014

Hmm does vmstat -s indicate that 10 is swapping much more than 9?

I know there was some issues in early 10 which resulted in it
swapping more but I was under the impression this was fixed by:
http://svnweb.freebsd.org/base?view=revision&revision=265944
http://svnweb.freebsd.org/base?view=revision&revision=265945

May be there's still and issue there?

    Regards
    Steve
----- Original Message ----- 
From: "Jeremy Chadwick" <jdc at koitsu.org>
To: "Steven Hartland" <killing at multiplay.co.uk>
Cc: <freebsd-stable at freebsd.org>
Sent: Sunday, July 20, 2014 6:35 PM
Subject: Re: Consistently "high" CPU load on 10.0-STABLE

> Yes and no, heh... :-)
> 
> Using top -a -H -S -z -s 1 and watching very very closely, what I end up
> seeing is that occasionally syncer reaches WCPU percentages of around
> 0.50% (or maybe higher) -- but when that happens, the actual load average
> **does not** suddenly increase.
> 
> The load just seems to go from like 0.01 or 0.02 to 0.12 or 0.20
> sporadically with no real evidence of "why" in top.  Possibly this is
> because I'm using -s 1 (one second update intervals) and whatever
> happens is so fast/quick that it lasts less than a second / top doesn't
> catch it, but still impacts the load?
> 
> Anyway, the "top" processes per TIME in the above command are here:
> 
>  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
>   17 root        16    -     0K    16K syncer  2   1:31   0.49% syncer
>    0 root       -16    0     0K  4912K swapin  0   1:04   0.00% kernel{swapper}
>    0 root       -92    0     0K  4912K -       2   0:17   0.00% kernel{em0 que}
> 1767 root        20    0 57312K 10104K select  1   0:15   0.00% smbd
>   12 root       -60    -     0K   528K WAIT    0   0:13   0.00% intr{swi4: clock}
>  643 dhcpd       20    0 24524K 13044K select  1   0:07   0.00% dhcpd
>   12 root       -88    -     0K   528K WAIT    1   0:04   0.00% intr{irq259: ahci0:ch}
>   14 root       -16    -     0K    16K -       2   0:04   0.00% rand_harvestq
> 58515 jdc         20    0 37028K  6196K select  2   0:03   0.00% sshd
>  420 bind        20    0 70872K 36552K kqread  0   0:02   0.00% named{named}
>   19 root        20    -     0K    16K sdflus  2   0:02   0.00% softdepflush
>  420 bind        20    0 70872K 36552K uwait   3   0:02   0.00% named{named}
> 
> This is with a system uptime of 14 hours.
> 
> Comparatively, the RELENG_9 box I have (although it's on a VPS), which
> does a lot more in general and has an uptime of 39 days, shows something
> like this:
> 
>  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
>   12 root       -60    -     0K   224K WAIT    0 132:49   0.00% intr{swi4: clock}
>    0 root       -92    0     0K   160K -       0  89:01   0.00% kernel{em0 taskq}
>   17 root        16    -     0K    16K syncer  0  45:46   0.00% syncer
>   12 root       -88    -     0K   224K WAIT    1  16:31   0.00% intr{irq14: ata0}
>   12 root       -60    -     0K   224K WAIT    1  12:52   0.00% intr{swi4: clock}
>   13 root        -8    -     0K    48K -       1   8:03   0.00% geom{g_down}
> 15490 halbot      20    0 76172K 22532K select  0   6:34   0.00% perl
>  593 bind        20    0 92288K 23740K uwait   1   4:39   0.00% named{named}
>  593 bind        20    0 92288K 23740K uwait   0   4:36   0.00% named{named}
> 
> So syncer looks like it might be about right for both systems, but
> "swapper" (still not sure what that is exactly) sure seems to be much
> more busy on the RELENG_10 box doing nothing vs. the RELENG_9 box.
> Here's the swapper line from the RELENG_9 box:
> 
> $ top -a -H -S -b 9999 | grep swapper
>    0 root       -16    0     0K   160K swapin  0   0:55   0.00% kernel{swapper}
> 
> It's the only suspect I have at this point but it's not very good
> evidence of anything.  :/
> 
> Maybe I can use while : ; do top -a -S -H -z -b 99999 >> somelog.txt ;
> sleep 0.25 ; done and let that run for a while in hopes of catching
> something?  But I also worry that such a test would actually impact the
> load itself.
> 
> On Sun, Jul 20, 2014 at 03:26:02PM +0100, Steven Hartland wrote:
>> If you add -H -z to your top command does anything stand out?
>> 
>>     Regards
>>     Steve
>> ----- Original Message ----- 
>> From: "Jeremy Chadwick" <jdc at koitsu.org>
>> To: <freebsd-stable at freebsd.org>
>> Sent: Sunday, July 20, 2014 7:24 AM
>> Subject: Consistently "high" CPU load on 10.0-STABLE
>> 
>> 
>> > (Please keep me CC'd as I'm not subscribed to freebsd-stable@)
>> > 
>> > Today I took the liberty of upgrading my main home server from
>> > 9.3-STABLE (r268785) to 10.0-STABLE (r268894).  The upgrade consisted of
>> > doing a fresh install of 10.0-STABLE on a brand new unused SSD.  Most
>> > everything went as planned, barring a couple ports-related anomalies,
>> > and I seemed fairly impressed by the fact that buildworld times had
>> > dropped to 27 minutes and buildkernel to 4 minutes with clang (something
>> > I'd been avoiding like the plague for a long while).  Kudos.
>> > 
>> > But after an hour or so, I noticed a consistent (i.e. reproducible)
>> > trend: the system load average tends to hang around 0.10 to 0.15 all the
>> > time.  There are times where the load drops to 0.03 or 0.04 but then
>> > something kicks it back up to 0.15 or 0.20 and then it slowly levels out
>> > again (over the course of a few minutes) then repeats.
>> > 
>> > Obviously this is normal behaviour for a system when something is going
>> > on periodically.  So I figured it might have been a userland process
>> > behaving differently under 10.x than 9.x.  I let top -a -S -s 1 run and
>> > paid very very close attention to it for several minutes.  Nothing.  It
>> > doesn't appear to be something userland -- it appears to be something
>> > kernel-level, but nothing in top -S shows up as taking up any CPU time
>> > other than "[idle]" so I have no idea what might be doing it.
>> > 
>> > The box isn't doing anything like routing network traffic/NAT, it's pure
>> > IPv4 (IPv6 disabled in world and kernel, and my home network does
>> > basically no IPv6) and sits idle most of the time fetching mail.  It
>> > does use ZFS, but not for /, swap, /var, /tmp, or /usr.
>> > 
>> > vmstat -i doesn't particularly show anything awful.  All the cpuX:timer
>> > entries tend to fluctuate in rate, usually 120-200 or so; I'd expect an
>> > interrupt storm to be showing something in the 1000+ range.
>> > 
>> > The only thing I can think of is the fact that the SSD being used has no
>> > 4K quirk entry in the kernel (and its ATA IDENTIFY responds with 512
>> > logical, 512 physical, even though we know it's 4K).  The partitions are
>> > all 1MB-aligned regardless.
>> > 
>> > This is all bare-metal, by the way -- no virtualisation involved.
>> > 
>> > I do have DTrace enabled/built on this box but I have absolutely no clue
>> > how to go about profiling things.  For example maybe output of this sort
>> > would be helpful (but I've no idea how to get it):
>> > 
>> > http://lists.freebsd.org/pipermail/freebsd-stable/2014-July/079276.html
>> > 
>> > I'm certain I didn't see this behaviour in 9.x so I'd be happy to try
>> > and track it down if I had a little bit of hand-holding.
>> > 
>> > I've put all the things I can think of that might be relevant to "system
>> > config/tuning bits" up here:
>> > 
>> > http://jdc.koitsu.org/freebsd/releng10_perf_issue/
>> > 
>> > I should note my kernel config is slightly inaccurate (I've removed some
>> > stuff from the file in attempt to rebuild, but building world prior to
>> > kernel failed due to r268896 breaking world, but anyone subscribed here
>> > has already seen the Jenkins job of that ;-) ).
>> > 
>> > Thanks.
>> > 
>> > -- 
>> > | Jeremy Chadwick                                   jdc at koitsu.org |
>> > | UNIX Systems Administrator                http://jdc.koitsu.org/ |
>> > | Making life hard for others since 1977.             PGP 4BD6C0CB |
>> > 
>> > _______________________________________________
>> > freebsd-stable at freebsd.org mailing list
>> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>> >
> 
> -- 
> | Jeremy Chadwick                                   jdc at koitsu.org |
> | UNIX Systems Administrator                http://jdc.koitsu.org/ |
> | Making life hard for others since 1977.             PGP 4BD6C0CB |
> 
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>