SMP, ACPI and interrupt storm
Tech Lab Manager
tech at liveoaksf.org
Mon Feb 11 20:20:35 UTC 2008
On Feb 4, 2008, at 6:00 AM, John Baldwin wrote:
> On Thursday 31 January 2008 02:35:52 pm Tech Lab Manager wrote:
>> On Jan 31, 2008, at 10:48 AM, Nate Lawson wrote:
>>
>>> Tech Lab Manager wrote:
>>>> Sorry for the cross-post from freebsd-smb.
>>>> Building 6.3-RELEASE and 7.0-RC1 on dual Xeon (4 CPU) boxes:
>>>> options SMP
>>>> device apic
>>>> SMP kernel builds fine, all 4 CPUs launch on reboot.
>>>> But I get a TON of interrupts from acpi0 -- about 67,000 per second
>>>> according to vmstat -i. With system at idle and almost no services
>>>> running, here is output of top -S:
>>>> last pid: 877; load averages: 1.18, 0.48, 0.19
>>>> 75 processes: 6 running, 54 sleeping, 15 waiting
>>>> CPU states: 0.0% user, 0.0% nice, 0.2% system, 22.4%
>>>> interrupt, 77.4% idle
>>>> Mem: 31M Active, 12M Inact, 28M Wired, 16K Cache, 15M Buf, 3822M
>>>> Free
>>>> Swap: 4096M Total, 4096M Free
>>>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
>>>> COMMAND
>>>> 10 root 1 171 52 0K 8K RUN 3 1:11 99.18%
>>>> idle: cpu3
>>>> 13 root 1 171 52 0K 8K CPU0 0 1:10 98.88%
>>>> idle: cpu0
>>>> 12 root 1 171 52 0K 8K CPU1 1 1:09 98.78%
>>>> idle: cpu1
>>>> 21 root 1 -52 -171 0K 8K CPU2 2 0:54 87.24%
>>>> irq9: acpi0
>>>> 11 root 1 171 52 0K 8K RUN 2 0:17 11.19%
>>>> idle: cpu2
>>>> Notice high load and interrupt % of CPU.
>>>> If turn off ACPI (e.g. set hint.apic.0.disabled=1 in /boot/
>>>> loader.conf),
>>>> the interrupt storm ceases, but then I'm only running on one CPU.
>>>
>>> That doesn't turn off acpi, that turns of the APIC (interrupt
>>> controller). Try:
>>> hint.acpi.0.disabled=1
>>
>> Sorry, my mistake in writing ACPI above -- I *was* trying to turn off
>> apic, based on a note in the FreeBSD handbook.
>>
>> Disabling ACPI as you suggest above has the same effect as turning
>> off APIC: the interrupt storm is disabled but only one CPU is
>> launched.
>>
>>>
>>>> The BIOS ACPI settings are all Enabled. Hyperthreading is Enabled.
>>>> These machines have been running RedHat Enterprise 5.0 with full
>>>> multiprocessor support.
>>>
>>> This looks like a failure to sleep in C1 (hlt). Someone else
>>> reported this probably earlier, but all debugging showed the
>>> inexplicable -- the HLT instruction was being executed but just did
>>> not work (returned immediately).
>>>
>>> There will be a new 7.0 build that fixes one interrupt storm
>>> related to level-triggered GPEs. If you can cvsup your 7.0 branch
>>> (RELENG_7_0) and retry, that might be helpful to see if it also
>>> fixes your problem.
>>
>> okay, I'm on RC1, will switch to RELENG and report back.
>>
>> I'm not sure if this is a red herring, but acpidump -t reports:
>>
>> Type=INT Override
>> BUS=0
>> IRQ=0
>> INTR=2
>> Flags={Polarity=conforming, Trigger=conforming}
>>
>> which looks wrong on several counts (IRQ, INTR should be 9,
>> Trigger=level). dmesg even says:
>> "MADT: Forcing active-low polarity and level trigger for SCI"
>
> No, this is an entry for something other than the SCI. You can
> have multiple
> interrupt override entries and this entry is typical on all x86
> systems with
> APICs (the 8259As are tied into pin 0 as a daisy chain and IRQ0 is
> tied into
> intpin 2 since IRQ2 isn't usable with 8259As. Do you have an entry
> at all
> for IRQ 9? If not, then the hw.acpi.sci tunables currently won't
> do anything
> (I can fix it so that they do, however).
Here's an update on this issue.
I csup'ed my source tree (RELENG_7_0 now at RC2) last Friday and
rebuilt world. Two things look slightly different now:
1) On reboot, I still see an interrupt storm at acpi0 (irq9) at
around 75k/sec; however over time the interrupt rate actually drops,
to around 15k/sec after a few days (perhaps it settles further, time
will tell).
2) load average [at idle] is down quite a bit, from a previous
average of ~1.0 to an average that seems to vacillate between a low
of 0.10 to a high of 0.35.
$ top -S
last pid: 1038; load averages: 0.22, 0.18, 0.15
67 processes: 5 running, 46 sleeping, 16 waiting
CPU states: 0.0% user, 0.0% nice, 0.1% system, 21.0% interrupt,
78.9% idle
Mem: 6468K Active, 5232K Inact, 23M Wired, 1540K Cache, 8688K Buf,
3849M Free
Swap: 4096M Total, 4096M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
11 root 1 171 ki31 0K 8K CPU3 3 74:15 99.02%
idle: cpu3
12 root 1 171 ki31 0K 8K CPU2 2 74:14 99.02%
idle: cpu2
13 root 1 171 ki31 0K 8K RUN 1 74:10 99.02%
idle: cpu1
24 root 1 -52 - 0K 8K WAIT 0 58:08 83.15%
irq9: acpi0
14 root 1 171 ki31 0K 8K RUN 0 16:05 14.84%
idle: cpu0
Note: for kicks I tried rebuilding the kernel with options
MPTABLE_FORCE_HTT and IPI_PREEMPTION, though without any apparent
effect. No device polling, and using SCHED_4BSD for what it's worth.
I don't know what a typical load for a multi-cpu box looks like;
we've only run single-cpu systems here, and even when working our
server loads are typically pretty close to 0.0. Basically we
inherited a bunch of dual Xeon machines and I'd like to make them
work-- of course I can just run them on one cpu but that seems kind
of silly. (Unfortunately I'm just a school administrator and not much
of a hardware guy, so I'm a little out of my depth here...;| )
Thanks for any further assistance anyone can provide.
--
John Berliner
Live Oak School
More information about the freebsd-acpi
mailing list