Weird PCI interrupt delivery problem (resolution, sort of)

Mon Jan 23 18:25:16 PST 2006

On Fri, Jan 20, 2006 at 03:42:21PM -0500, John Baldwin wrote:
> On Thu, Jan 19, 2006 at 10:17:39PM -0700, Scott Long wrote:
> > This points to a bus coherency problem.  I wonder if your BIOS is
> > incorrectly setting the memory region of the apics as cachable.  You'll
> > want to bug Baldwin about this.
> 
> Hmm, well, you can actually try the PAT patch if you are feeling brave as it 
> maps all devices (including APICs) as uncacheable.

Tried the updated PAT patch (with s/pmap_unmapbios/pmap_unmap_bios/ to
get ACPI to compile).  Unfortunately if it is a caching problem, PAT
isn't able to fix it.  Same result as stock kernel -- interrupts stop
arriving after a dozen or so.  AFAICT the local APIC is the only
memory-mapped I/O region that seems to be problematic.

Instead of writing the value twice, I also tried inserting an
__asm("nop") before the write with no effect.  Also, a single write to
an unrelated area doesn't help:

+static volatile int dummyeoi;
+
 lapic_eoi(void)
 {

+	dummyeoi = 1;
 	lapic->eoi = 0;
+	dummyeoi = 2;
 }

I'm _reasonably_ certain that marking dummyeoi volatile and leaving it
uninitialized will prevent gcc from optimizng that out.  Forcing R/W
cycles (++dummyeoi) before and after doesn't work either.

A DELAY(1) before the lapic->eoi write does the trick, but DELAY does
lots of complicated things so I don't know how useful of a data point
that is.

I'm probably missing something, but if bad cache behavior was causing
writes to the lapic EOI register to not always take effect, wouldn't the
_next_ irq (even if it's a different line) cause the one that's
currently pending to be acknowledged?

Craig