stable/10: high load average when box is idle

Ian Smith smithi at nimnet.asn.au
Tue Nov 3 18:00:58 UTC 2015


On Mon, 26 Oct 2015 22:05:08 -0700, Jeremy Chadwick wrote:

 > (I am not subscribed to the mailing list, please keep me CC'd)
 > 
 > Issue: a stable/10 system that has an abnormally high load average (e.g.
 > 0.15, but may be higher depending on other variables which I can't
 > account for) when the machine is definitely idle (i.e. cannot be traced
 > to high interrupt usage per vmstat -i, cannot be traced to a userland
 > process or kernel thread, etc.).
 > 
 > This problem has been discussed many times on the FreeBSD mailing lists
 > and the FreeBSD forum (including some folks seeing it on 9.x, but my
 > complaint here is focused on 10.x so please focus there).
 > 
 > I'd politely like to request that anyone experiencing this, or who has
 > experienced it (and if you know when it stopped or why, including what
 > you may have done, include that), to chime in on this ticket from 2012
 > (made for 9.x but style of issue still applies; c#5 is quite valid):
 > 
 > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541
 > 
 > For those still experiencing it, I'd suggest reading c#8 and seeing if
 > sysctl kern.eventtimer.periodic=1 relieves the problem for you.  (At
 > this time I would not suggest leaving that set indefinitely, as it does
 > seem to increase the interrupt rate in cpuX:timer in vmstat -i.  But for
 > me kern.eventtimer.periodic=1 "fixes" the issue)

Jeremy, I've waited till now to reply, despite really welcoming serious 
discussion of these LA issues.  Rather than joining the PR thread just 
yet, I'm going to report a short summary of findings, having monitored 
this ever since it turned up, originally noticed on 9.2 about 2 years 
ago; I have an embarrasing amount of data to report when appropriate.

Like yourself, I think this is far from 'cosmetic' as is oft suggested, 
especially in some fairly ill-informed forum posts but also various list 
posts.  I've been watching load averages since OS/2 through FreeBSD 2.2 
till present and some Linux systems, and have never before seen anything 
like this, especially on virtually entirely idle systems.  While the LAs
reported during (say) make -j4 buildworld appear more same, who knows?

I'm not suggesting that I think there's any sort of performance hit from 
this; so far I don't, but 'cosmetic' suggests that it doesn't matter ..

I've just brought my 9.3-R Lenovo X200 up to stable/9 to be sure nothing 
had changed in this respect; it hasn't.  So despite your desire to focus 
on 10.x, and your issue being with systems using LAPIC event counters, I 
think there are related - but different - manifestations of the issue/s.

Firstly, the '0.6' LA at idle often seen in the various reports you've 
so thoroughly referenced including bug 173541, seems to occur most often 
(if not entirely) on laptops with Core 2 and Core 2 Duo processors that 
by default use the HPET as the event counter (and not in per-cpu mode, 
as eventtimers(4) would imply).  I see long-term 15 minute LAs of ~0.55 
to ~0.67, from completely idle to idle with X running, 200% idle shown.

On these systems, switching to using LAPIC does indeed 'fix' the issue, 
resulting in genuine 0.00 0.00 0.00 LAs when completely idle, with say 
0.10-0.15 short-term when poking it a bit with X running - however C3 is 
unavailable using LAPIC, pushing power consumption on battery from less 
than 8W to almost 12W (idle, screen off, lid down) ie ~+50% consumption 
or ~-33% battery life - so is not a viable solution for laptops actually 
used away from mains power.  Correspondingly, system temperatures are 
consistently up to 20% higher - on AC or battery - when using LAPIC.

So your problem is while using LAPIC, mine and others while using HPET, 
and despite some semi-obvious relatedness, there are differences, and I 
don't want to tromp on or confuse the issue you're chasing, yet think 
there's some underlying - possibly just mathematical? - issues here.

I've spent some days trying to follow the HPET and LAPIC code, but it's 
been too long since I thought I had a tiny grip on any of this, before 
mav@ turned it on its head - for which we're of course grateful :) eg:
 /sys/x86/x86/local_apic.c
 /sys/amd64/include/apicvar.h
 /sys/dev/acpica/acpi_hpet.[ch]

There are some other differences between HPET and LAPIC behaviour on 
mine best seesn in systat -vm, especially regarding interrupt usage, 
both in one-shot and periodic modes for both, that require posting quite 
a lot of data .. so I guess I'm wondering if that's appropriate now?

I should add that I'm having some health issues that make it difficult 
for me to spend much time digging deeply into code or much testing, so 
that's made me a bit reluctant to dive into this .. but I can't resist!

cheers, Ian


More information about the freebsd-stable mailing list