kern/118093: firewire bus reset hogs CPU,
causing data to be lost
freebsd at sopwith.solgatos.com
Thu Dec 18 01:00:12 PST 2008
The following reply was made to PR kern/118093; it has been noted by GNATS.
From: Dieter <freebsd at sopwith.solgatos.com>
To: freebsd-firewire at freebsd.org, bug-followup at freebsd.org
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be lost
Date: Thu, 18 Dec 2008 00:51:41 +0000
> > : Warner> This can't be the case. There's no SPL involved at all. Maybe
> > : Warner> removing the printfs causes an interrupt to be serviced faster,
> > : Warner> resulting in what appears to be better performance...
> > :
> > : With the printfs, Ethernet is not getting serviced. This
> > : is repeatable and easily reproduced. Without the printfs,
> > : it seems ok.
> > :
> > : If it isn't spl, what is locking out Ethernet?
> > interrupt storm? In old-spl-locked drivers, often times the interrupt
> > would be masked while certain operations happened. In new
> > mutex-locked freebsd, there's no way to block the interrupts, so
> > sometimes old code gets into an interrupt storm, which starves other
> > things. Not sure why printf would trigger this, however...
I tried logging in on the console and "cat /etc/termcap" does
not create a problem. Systat -vmstat reports that IRQ 5 (console)
interrupts go from 0 to 120. CPU:
0.0%Sys 0.6%Intr 0.0%User 0.0%Nice 99.4%Idle
The normal firewire bus reset printfs generate 890 IRQ 5 interrupts
with the shell firing off "fwcontrol -u 1 -r" in a loop.
CPU goes from idle to 100% busy:
98.1%Sys 1.7%Intr 0.3%User 0.0%Nice 0.0%Idle
"fwcontrol -u 0 -r" (the onboard firewire controller) gives
the same 890 IRQ 5 interrupts, but CPU usage is different:
79.2%Sys 20.8%Intr 0.0%User 0.0%Nice 0.0%Idle
A couple of other IRQs go up to 30-40 range, nothing huge.
A single "fwcontrol -u 0 -r" generates 14 interrupts on IRQ 5
and 4 interrupts on 3 other IRQs.
A single "fwcontrol -u 1 -r" generates 14 interrupts on IRQ 5
and 3 interrupts on 2 other IRQs. yawn
With the bus reset printfs commented out, I get:
80.0%Sys 4.8%Intr 13.9%User 0.0%Nice 1.2%Idle
3300 interrupts on the onboard firewire controller
84.8%Sys 1.1%Intr 14.1%User 0.0%Nice 0.0%Idle
3900 interrupts on the PCI card firewire controller
After awhile the console says:
fwohci1: phy read failed(1). i = 100 retry = 0
fwohci1: phy read failed(1). i = 100 retry = 1
fwohci1: phy read failed(1). i = 100 retry = 2
fwohci1: phy read failed(1). i = 100 retry = 3
and the system becomes unresponsive. Reset button required. :-(
Kernel printfs to console appear to take more interrupts and
more CPU than userland output to console.
Seems odd that the onboard controller uses so much CPU for
interrupts in the with-printf case, but there wasn't a
large number on interrupts. The without-printf case had
a LOT more interrupts from the fw controller.
Resetting the firewire bus in an endless loop is not recommended.
I'm not an expert on interrupt storms, but none of this looks
like an interrupt storm to me. (Assuming systat is trustworthy.)
Looks like the kernel printf takes a *lot* of CPU, and
somehow locks out Ethernet.
More information about the freebsd-bugs