x86: finding interrupts that aren't being accounted for?
John Baldwin
jhb at freebsd.org
Wed Apr 8 16:12:19 UTC 2015
On Monday, April 06, 2015 02:16:23 PM Adrian Chadd wrote:
> On 6 April 2015 at 14:15, Rui Paulo <rpaulo at me.com> wrote:
> >
> >> On Apr 6, 2015, at 13:38, Adrian Chadd <adrian at FreeBSD.org> wrote:
> >>
> >> On 6 April 2015 at 12:18, John Baldwin <jhb at freebsd.org> wrote:
> >>> On Monday, April 06, 2015 12:21:29 AM Adrian Chadd wrote:
> >>>> Hi,
> >>>>
> >>>> I have an .. odd problem on a Lenovo X230.
> >>>>
> >>>> I just threw in a very old wifi card (Intel 3945) into the expresscard
> >>>> (pcie) slot. Now, we don't have any pcie-hp support in -HEAD just yet,
> >>>> but i wasn't expecting the system to crawl to a halt.
> >>>>
> >>>> When I unplug it, everything returns to normal.
> >>>>
> >>>> Other cards don't do this.
> >>>>
> >>>> So, I figured it may be interrupt spam - but vmstat -ia shows no
> >>>> interrupts going crazy.
> >>>>
> >>>> pmcstat -S CPU_CLK_UNHALTED_CORE -T -w 5 doesn't register anything
> >>>> either - only a handful of background samples.
> >>>>
> >>>> However, /counter/ mode pmc tells a different story - pmcstat -s
> >>>> CPU_CLK_UNHALTED_CORE -w 1 shows all four cores going at 110% when the
> >>>> card is inserted, with brief periods of idle. Once I remove the card,
> >>>> the counters go back down to zero.
> >>>>
> >>>> My working theory is: something is chewing CPU and it's likely
> >>>> interrupts, but if it is, it's something far, far earlier than the x86
> >>>> interrupt C code, which counts interrupts and spurious events.
> >>>>
> >>>> So - has anyone diagnosed this stuff on FreeBSD/x86 before? I was kind
> >>>> of hoping we'd at least get accurate statistics about spurious
> >>>> interrupts, and if we don't, I'd like to understand why.
> >>>>
> >>>> Thanks!
> >>>
> >>> SMM? Perhaps SMM doesn't hide itself from PMC counters (but it can hide itself
> >>> from samples).
> >>>
> >>> If it is SMM there's not really anything you can do about it. Try getting a
> >>> KTR_SCHED trace and looking at it in schedgraph. When I've seen SMM isuses in
> >>> the past it shows up as hole in the graph where nothing happens in the system.
> >>>
> >>> In your case you could perhaps be getting PCI errors that are triggering the
> >>> SMM handler. Perhaps compare pciconf -le before and after to see if there are
> >>> any changes.
> >>
> >> Hm, ok. Can we extract PCIe errors yet?
> >
> > Yes, check pciconf.
>
> I'll try, but the system is pretty unusable whilst the card is plugged in...
PCI errors latch. You can run 'pciconf -le' after you yank the card back out.
I would just do this:
'pciconf -le > before'
<insert card and yank it back out>
'pciconf -le > after'
Compare before and after using something like 'kompare'.
--
John Baldwin
More information about the freebsd-arch
mailing list