STI, HLT in acpi_cpu_idle_c1

Wed Jun 23 05:40:22 GMT 2004

:Thanks for the detailed info on this.  It looks like CPU1 is trying 
:to service the interrupt because PPR = 0xf0, and TPR = 0x00.  It is 

    Yup.

:also the only CPU that has a bit set in ISR.  In this case, CPU 3 
:was initiating the IPI (although I don't know why its icr_lo is
:0xc00f6 because I was expecting it to be 0xc00f3 (and it was in 
:previous lockups).  I still have no idea why CPU1 is not handling
:this interrupt though.  I am still getting used to this emulator, but
:I think the values I am reading are believable:

    Yup.  It would be nice to see the contents of the TMR and IRR
    as well, but the ISR is very important.  If the ISR bit is set
    it means that the APIC delivered the interrupt to the cpu but
    the cpu has not yet EOI'd it.  This would account for the contents
    of the PPR (it is based on the highest pending ISR bit.  When you
    EOI the APIC it knows what to EOI based on the PPR).

    It is possible that the cpu got the interrupt and started processing it,
    then deadlocked in a mutex and went to idle before it had a chance to
    EOI it.  I haven't looked at the IPI handling path in 5.x recently so this
    might not be a possible case, but so far it's the only thing that fits
    the bill.  Perhaps it tried to use a sleep mutex when it really should
    have been using a spin mutex. 

    Once the APIC has delivered the interrupt to the cpu nothing more
    happens to that interrupt until it is EOId.  If the cpu has not EOI'd
    that interrupt and HLT's, it will *NOT* be reinterrupted by that 
    particular interrupt. 

    What I would do now is (attempt to) get some sort of core that you can
    gdb on or attach gdb too.  I'm afraid I can't help much there.  But what
    you want to do is get a stack backtrace for every single thread in the
    system, looking for a stuck mutex in an IPI path interrupting the normal
    thread's operation.  Again, I don't know if this scenario is possible,
    but it's the only thing I can think of.   Perhaps John can narrow down
    the possibilities some more.

    On the ICR_LO values: I don't know what one should expect for the vector
    portion.  The 'c' in the c00f6 is definitely correct (the last command
    was sent to all excluding self).  The status fields are 0
    (Edge trigger, APIC has accepted the command, physical mode, fixed
    command).  Since all APICs have S=0 we know that the problem is NOT
    an APIC-APIC deadlock.  The APICs look to be in good shape.

:CPU 0
:TPR:   0x0 	<< not priority masked
:PPR:   0x0 	<< nothing pending
:icr_lo:0xf3 	<< ready to accept command
:CPU 1
:ID:    0x7000000 
:TPR:	0x0	<< not priority masked
:PPR:   0xf0 	<< priority of delivered but not-yet EOI'd interrupt
:icr_lo:0xf3 	<< ready to accept command
:ISR7:  0x80000 << interrupt delivered and is in-service, not yet EOI'd
:CPU 2
:TPR:	0x0	<< not priority masked
:PPR:   0x0 	<< nothing pending
:icr_lo:0xfb 	<< ready to accept command
:CPU 3
:ID:    0x1000000 
:TPR:   0x0 	<< not priority masked
:PPR:   0x0 	<< nothing pending
:icr_lo:0xc00f6 << ready to accept new command (previous command was 
		   accepted), last sent command was IPI to all-but-self.
:
:Gerrit

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>