STI, HLT in acpi_cpu_idle_c1
Matthew Dillon
dillon at apollo.backplane.com
Wed Jun 23 05:40:22 GMT 2004
:Thanks for the detailed info on this. It looks like CPU1 is trying
:to service the interrupt because PPR = 0xf0, and TPR = 0x00. It is
Yup.
:also the only CPU that has a bit set in ISR. In this case, CPU 3
:was initiating the IPI (although I don't know why its icr_lo is
:0xc00f6 because I was expecting it to be 0xc00f3 (and it was in
:previous lockups). I still have no idea why CPU1 is not handling
:this interrupt though. I am still getting used to this emulator, but
:I think the values I am reading are believable:
Yup. It would be nice to see the contents of the TMR and IRR
as well, but the ISR is very important. If the ISR bit is set
it means that the APIC delivered the interrupt to the cpu but
the cpu has not yet EOI'd it. This would account for the contents
of the PPR (it is based on the highest pending ISR bit. When you
EOI the APIC it knows what to EOI based on the PPR).
It is possible that the cpu got the interrupt and started processing it,
then deadlocked in a mutex and went to idle before it had a chance to
EOI it. I haven't looked at the IPI handling path in 5.x recently so this
might not be a possible case, but so far it's the only thing that fits
the bill. Perhaps it tried to use a sleep mutex when it really should
have been using a spin mutex.
Once the APIC has delivered the interrupt to the cpu nothing more
happens to that interrupt until it is EOId. If the cpu has not EOI'd
that interrupt and HLT's, it will *NOT* be reinterrupted by that
particular interrupt.
What I would do now is (attempt to) get some sort of core that you can
gdb on or attach gdb too. I'm afraid I can't help much there. But what
you want to do is get a stack backtrace for every single thread in the
system, looking for a stuck mutex in an IPI path interrupting the normal
thread's operation. Again, I don't know if this scenario is possible,
but it's the only thing I can think of. Perhaps John can narrow down
the possibilities some more.
On the ICR_LO values: I don't know what one should expect for the vector
portion. The 'c' in the c00f6 is definitely correct (the last command
was sent to all excluding self). The status fields are 0
(Edge trigger, APIC has accepted the command, physical mode, fixed
command). Since all APICs have S=0 we know that the problem is NOT
an APIC-APIC deadlock. The APICs look to be in good shape.
:CPU 0
:TPR: 0x0 << not priority masked
:PPR: 0x0 << nothing pending
:icr_lo:0xf3 << ready to accept command
:CPU 1
:ID: 0x7000000
:TPR: 0x0 << not priority masked
:PPR: 0xf0 << priority of delivered but not-yet EOI'd interrupt
:icr_lo:0xf3 << ready to accept command
:ISR7: 0x80000 << interrupt delivered and is in-service, not yet EOI'd
:CPU 2
:TPR: 0x0 << not priority masked
:PPR: 0x0 << nothing pending
:icr_lo:0xfb << ready to accept command
:CPU 3
:ID: 0x1000000
:TPR: 0x0 << not priority masked
:PPR: 0x0 << nothing pending
:icr_lo:0xc00f6 << ready to accept new command (previous command was
accepted), last sent command was IPI to all-but-self.
:
:Gerrit
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the freebsd-current
mailing list