Cubieboard: Spurious interrupt detected

Sat Sep 6 01:13:06 UTC 2014

On Fri, 2014-09-05 at 17:44 -0700, Adrian Chadd wrote:
> On 5 September 2014 16:11, Ian Lepore <ian at freebsd.org> wrote:
> > On Fri, 2014-09-05 at 23:57 +0200, Bernd Walter wrote:
> >> On Sat, Sep 06, 2014 at 01:43:23AM +0400, Maxim V FIlimonov wrote:
> >> > And another problem: every now and then the kernel says something like that:
> >> > Sep  5 19:22:37  kernel: Spurious interrupt detected
> >> > Sep  5 19:22:37  kernel: Spurious interrupt detected
> >> > Sep  5 19:23:46  last message repeated 10 times
> >> >
> >> > I've heard that FreeBSD happens to do that on ARM devices. What could be the
> >> > problem here?
> >>
> >> Means something generates inetrrupts, which are not handled by a driver.
> >> Could be the cause for your load problem too.
> >>
> >
> > No, that would be stray interrupts.  Spurious interrupts happen when an
> > interrupt is asserted, but by time the processor asks the interrupt
> > controller for the current active interrupt, it is no longer active.
> >
> > One way it can happen is when an interrupt handler writes to a device to
> > clear a pending interrupt and that write takes a long time to complete
> > because the device is on a slow bus, and the interrupt controller is on
> > a faster bus.  The EOI to the controller outraces the device write that
> > would clear the pending interrupt condition, so the processor is
> > re-interrupted, but by time it asks for the next active interrupt the
> > device write has finally completed and the interrupt is no longer
> > pending.
> >
> > That sequence used to happen a lot, and it was "fixed" by adding an
> > l2cache sync (basically a "drain write buffer") just before an EOI.  You
> > sometimes still see an occasional spurious interrupt, but it shouldn't
> > be happening multiple times per second as seen in the logging above.
> 
> Hm, interesting. I remember your discussion about it on IRC. The
> atheros code ends up working around this in the driver by doing a read
> from the ISR after writing out bits to clear things, so the clear is
> flushed out.
> 
> I wonder if we should be asking all device drivers to be doing their
> own ISR flushing before returning from their interrupt handlers.

That's a big job, it means modifying more or less every driver in the
system.  Quoting part of the comment block I added along with the fix:

        The right way to fix this problem is for every device driver to
        use the proper bus_space_barrier() calls in its interrupt
        handler.  For ARM a single barrier call at the end of the
        handler would work.  This would have to be done to every driver
        in the system, not just arm-specific drivers.  

        Another potential fix is to map all device memory as
        Strongly-Ordered rather than Device memory, which takes the
        store buffers out of the picture.  This has a pretty big impact
        on overall system performance, because each strongly ordered
        memory access causes all L2 store buffers to be drained.  

        A compromise solution is to have the interrupt controller
        implementation call this function to establish a barrier between
        writes to the interrupt-source device and writes to the
        interrupt controller device.  

        This takes the interrupt number as an argument, and currently
        doesn't use it.  The plan is that maybe some day there is a way
        to flag certain interrupts as "memory barrier safe" and we can
        avoid this overhead with them.  

-- Ian