Consistently "high" CPU load on 10.0-STABLE

Sun Jul 20 23:35:58 UTC 2014

On Mon, Jul 21, 2014 at 12:16:48AM +0100, Steven Hartland wrote:
> 
> ----- Original Message ----- 
> From: "Jeremy Chadwick" <jdc at koitsu.org>
> To: "Adrian Chadd" <adrian at freebsd.org>
> Cc: "Steven Hartland" <killing at multiplay.co.uk>; "FreeBSD Stable Mailing List" <freebsd-stable at freebsd.org>
> Sent: Sunday, July 20, 2014 11:58 PM
> Subject: Re: Consistently "high" CPU load on 10.0-STABLE
> 
> 
> > On Sun, Jul 20, 2014 at 03:09:55PM -0700, Adrian Chadd wrote:
> >> hi,
> >> 
> >> it looks like a whole lot of things are waking up at the same time:
> >> 
> >> * dhcpd
> >> * em
> >> * usb devices
> >> 
> >> So, do you have some shared interrupts going on here? That seems to be
> >> what's causing things to all wake up all at once.
> > 
> > I forget how to get an interrupt mapping from the I/O APIC, but dmesg
> > indicates the following.  Sorted by IRQ order so that you can tell
> > what's associated with what, and also RELENG_9 vs. RELENG_10 (because I
> > do have an old dmesg.today from this box running RELENG_9).  All the
> > IRQs match up:
> > 
> > dev       RELENG_10      RELENG_9
> > --------  -------------  -------------
> > ioapic0   IRQs  0 to 23  IRQs  0 to 23 (same)
> > ioapic1   IRQs 24 to 47  IRQs 24 to 47 (same)
> > attimer0  IRQ 0          IRQ 0 (same)
> > atkbdc0   IRQ 1          IRQ 1 (same)
> > atkbd0    IRQ 1          IRQ 1 (same)
> > uart1     IRQ 3          IRQ 3 (same)
> > uart0     IRQ 4          IRQ 4 (same)
> > atrtc0    IRQ 8          IRQ 8 (same)
> > em0       IRQ 16         IRQ 16 (same)
> > pcib1     IRQ 16         IRQ 16 (same)
> > pcib3     IRQ 16         IRQ 16 (same)
> > pcib4     IRQ 16         IRQ 16 (same)
> > uhci0     IRQ 16         IRQ 16 (same)
> > ahci0     IRQ 17         IRQ 17 (same)
> > em1       IRQ 17         IRQ 17 (same)
> > ichsmb0   IRQ 17         IRQ 17 (same)
> > pcib5     IRQ 17         IRQ 17 (same)
> > uhci1     IRQ 17         IRQ 17 (same)
> > ehci0     IRQ 18         IRQ 18 (same)
> > uhci2     IRQ 18         IRQ 18 (same)
> > uhci5     IRQ 18         IRQ 18 (same)
> > siis0     IRQ 21         IRQ 21 (same)
> > uhci4     IRQ 22         IRQ 22 (same)
> > ehci1     IRQ 23         IRQ 23 (same)
> > uhci3     IRQ 23         IRQ 23 (same)
> > 
> > And the higher-numbered IRQs per vmstat -i.  I only have this for
> > RELENG_10 however:
> > 
> > irq256: em0                      1848856         26
> > irq259: ahci0:ch0                 273086          3
> > irq260: ahci0:ch1                   9990          0
> > irq261: ahci0:ch2                  48514          0
> > irq262: ahci0:ch3                  48046          0
> > irq263: ahci0:ch4                  48258          0
> > irq264: ahci0:ch5                  48052          0
> > 
> > vmstat -i for this is kinda painful (discussed this with jhb@ in the
> > past, re: kernel just appending "+" to the string to indicate "many
> > things using this IRQ").
> > 
> > I have absolute no USB devices attached to the system (meaning there are
> > USB controllers and ports, yeah, but nothing attached to any of them).
> > The keyboard is PS/2.  All disks are on ahci0 (no disks currently
> > attached to siis0).
> > 
> > As for dhcpd: I don't know how that'd be responsible.  If I stop the
> > process entirely I still see the problem.
> > 
> > I can provide some more ktrdumps, along with turning off as many daemons
> > + cron jobs as I can, if you feel that'd be helpful.
> > 
> > Likewise I can provide an ACPI DSDT dump if that would be useful (maybe
> > to someone else).
> > 
> > I haven't tried booting the box in single-user and letting it sit there
> > to see if anything shows up there.
> > 
> > In the interim I wrote the perl script I mentioned in my mail to Steve.
> > When the load shoots up, there is literally no field in "vmstat -s"
> > that shows a humongous increase (or decrease) consistently.  Meaning
> > I'd say 95% of the time when there's a sudden load jump, none of those
> > statistics I can correlate with it.  It's a pretty "meh" script, but
> > it does the job of showing deltas between vmstat -s runs and indicating
> > visually when there's a jump in load average (1m avg).  It requires a
> > VERY wide terminal (about 301 characters):
> > 
> > http://jdc.koitsu.org/freebsd/releng10_perf_issue/load_vmstat.pl
> > 
> > Some example output is here (obviously can't see the red+bold
> > highlighting of the line):
> > 
> > http://jdc.koitsu.org/freebsd/releng10_perf_issue/example_data.txt
> > 
> > Load jumps at the following time indexes:
> > 
> > 124.0  (from 0.02 to 0.10, load delta: 0.08)
> > 153.0  (from 0.06 to 0.14, load delta: 0.08, time delta: 29.0 sec)
> > 178.5  (from 0.10 to 0.17, load delta: 0.07, time delta: 25.5 sec)
> > 217.0  (from 0.09 to 0.17, load delta: 0.08, time delta: 38.5 sec)
> > 236.0  (from 0.12 to 0.19, load delta: 0.07, time delta: 19.0 sec)
> > 244.0  (from 0.17 to 0.24, load delta: 0.07, time delta:  8.0 sec)
> > 259.0  (from 0.20 to 0.27, load delta: 0.07, time delta: 15.0 sec)
> > 284.5  (from 0.19 to 0.25, load delta: 0.06, time delta: 25.5 sec)
> > 310.0  (from 0.18 to 0.25, load delta: 0.07, time delta: 25.5 sec)
> > 341.5  (from 0.27 to 0.33, load delta: 0.06, time delta: 31.5 sec)
> > 
> > Some of these could be due to cron jobs I run (though they really aren't
> > that intensive on disk, CPU, or memory), but there's a pretty consistent
> > pattern going on there load-wise.  The reason noted time deltas was
> > watching for "periodic tasks", e.g. ZFS txg flush.  But this seems to
> > have a little bit more variance.
> > 
> > It's just that none of the vmstat -s statistics change rapidly alongside
> > the load.  But I'm sure there are VM bits that aren't tracked in vmstat.
> 
> Not sure if its in stable/10 but there was some talk about making
> ZFS use lz4 for some things by default, wonder if that might have
> something to do with it?

I hope not -- the ZFS pools and filesystems both were created on
RELENG_9 circa January 2014 (according to "zpool history").  Well,
actually, the backups pool was created a few days ago (on RELENG_9)
due to a disk failure.  ZFS details:

http://jdc.koitsu.org/freebsd/releng10_perf_issue/zfs_get_all.txt
http://jdc.koitsu.org/freebsd/releng10_perf_issue/zpool_status.txt
http://jdc.koitsu.org/freebsd/releng10_perf_issue/zdb_ashift.txt

I do use lz4 compression on two specific filesystems (data/www and
data/cvs); content on data/www is only fed by Apache, and data/cvs isn't
used by anything unless I'm using "cvs" (I do run cvspserver). I don't
use dedup or snapshots.  The backing disks are 4K-sector (they lack 4K
kernel quirks in FreeBSD) but ashift is 12 on all of them.

If we think it's related to lz4, I can absolutely recreate those
filesystems sans compression with little effort -- they're very small
and contain very little data:

$ zfs list data/www data/cvs
NAME       USED  AVAIL  REFER  MOUNTPOINT
data/cvs   520K  1.27T   520K  /cvs
data/www  25.9M  1.27T  25.9M  /www

I don't dare run "zpool upgrade" until this can be figured out or at
least determined for sure to be "specific to my environment" (I'll just
go back to RELENG_9 in that case, no worries).

-- 
| Jeremy Chadwick                                   jdc at koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Making life hard for others since 1977.             PGP 4BD6C0CB |