Cubieboard: Spurious interrupt detected

Sat Sep 6 07:11:06 UTC 2014

Adrian Chadd wrote this message on Fri, Sep 05, 2014 at 23:45 -0700:
> On 5 September 2014 21:54, John-Mark Gurney <jmg at funkthat.com> wrote:
> > Ian Lepore wrote this message on Fri, Sep 05, 2014 at 19:33 -0600:
> >> On Fri, 2014-09-05 at 18:15 -0700, John-Mark Gurney wrote:
> >> > Adrian Chadd wrote this message on Fri, Sep 05, 2014 at 17:44 -0700:
> >> > > On 5 September 2014 16:11, Ian Lepore <ian at freebsd.org> wrote:
> >> > > > On Fri, 2014-09-05 at 23:57 +0200, Bernd Walter wrote:
> >> > > >> On Sat, Sep 06, 2014 at 01:43:23AM +0400, Maxim V FIlimonov wrote:
> >> > > >> > And another problem: every now and then the kernel says something like that:
> >> > > >> > Sep  5 19:22:37  kernel: Spurious interrupt detected
> >> > > >> > Sep  5 19:22:37  kernel: Spurious interrupt detected
> >> > > >> > Sep  5 19:23:46  last message repeated 10 times
> >> > > >> >
> >> > > >> > I've heard that FreeBSD happens to do that on ARM devices. What could be the
> >> > > >> > problem here?
> >> > > >>
> >> > > >> Means something generates inetrrupts, which are not handled by a driver.
> >> > > >> Could be the cause for your load problem too.
> >> > > >>
> >> > > >
> >> > > > No, that would be stray interrupts.  Spurious interrupts happen when an
> >> > > > interrupt is asserted, but by time the processor asks the interrupt
> >> > > > controller for the current active interrupt, it is no longer active.
> >> > > >
> >> > > > One way it can happen is when an interrupt handler writes to a device to
> >> > > > clear a pending interrupt and that write takes a long time to complete
> >> > > > because the device is on a slow bus, and the interrupt controller is on
> >> > > > a faster bus.  The EOI to the controller outraces the device write that
> >> > > > would clear the pending interrupt condition, so the processor is
> >> > > > re-interrupted, but by time it asks for the next active interrupt the
> >> > > > device write has finally completed and the interrupt is no longer
> >> > > > pending.
> >> > > >
> >> > > > That sequence used to happen a lot, and it was "fixed" by adding an
> >> > > > l2cache sync (basically a "drain write buffer") just before an EOI.  You
> >> > > > sometimes still see an occasional spurious interrupt, but it shouldn't
> >> > > > be happening multiple times per second as seen in the logging above.
> >> > >
> >> > > Hm, interesting. I remember your discussion about it on IRC. The
> >> > > atheros code ends up working around this in the driver by doing a read
> >> > > from the ISR after writing out bits to clear things, so the clear is
> >> > > flushed out.
> >> > >
> >> > > I wonder if we should be asking all device drivers to be doing their
> >> > > own ISR flushing before returning from their interrupt handlers.
> >> >
> >> > This is required on PCI (that you do a read to clear the posted/pending
> >> > write)...    So, IMO, yes, all device drivers should do the proper
> >> > clearing of their writes to the ISR...
> >> >
> >>
> >> But a driver can't assume that a read is sufficient on all architectures
> >> it may run on.  bus_space_barrier() is the right way.  Also, it's not
> >
> > Except that I don't think even on PCI a bus_space_barrier is sufficient...
> 
> It isn't.
> 
> The device itself may have FIFOs and internal busses that also need to
> be flushed.

Partly this depends upon which bus the device is on...  Though we'd
like our drivers to be bus agnostic, items like these force them not
to be...

> > I was just looking at i386's implementation of bus_space_barrier and
> > it just does a stack access...  This won't be sufficient to clear any
> > PCI bridges that may have the write still pending...
> 
> Right. The memory barrier semantics right now don't at all guarantee
> that bus and device FIFOs have actually been flushed.
> 
> So I don't think doing it using the existing bus space barrier
> semantics is 'right'. For interrupts, it's highly likely that we do
> actually need device drivers to read from their interrupt register to
> ensure the update has been posted before returning. That's better than
> causing entire L2 cache flushes.
> 
> Question is - can we expose this somehow as a generic device method,
> so the higher bus layers can actually do something with it, or should
> we just leave it to device drivers to correctly do?

Sadly, I really think it's up to the device driver...  Are we going to
require the driver to expose a register that gets read before we EOI
the interrupt?  what if another bus does it differently?

> (Also - do any of the freebsd device driver books or the handbook mention this?)

Well, when I did a quick search, I found an HP Writing PCI Driver
guide...  Googled: "pci read after write isr" and the first link was:
http://h21007.www2.hp.com/portal/download/files/unprot/ddk/ddg/chap8.pdf

PCI Transaction Ordering - Write Side Effects

And I just checked the old D&I (haven't gotten my new one yet) and
it's device driver section is quite small and doesn't mention this
issue...  But it also doesn't talk about how to setup interrupts or
resources...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."