hardclock interrupt deadlock
Luigi Rizzo
rizzo at icir.org
Thu Oct 16 08:49:13 PDT 2003
On Thu, Oct 16, 2003 at 11:17:50AM -0400, Michael Marchetti wrote:
> Hi,
>
> We have encountered a problem where the system hangs. We are running a 4.7
> SMP kernel using kernel polling on a Dual Xeon with hyperthreading enabled
puzzled on what you mean by "kernel polling" ... DEVICE_POLLING,
if that is what you mean, cannot work with SMP -- it should not even
build unless you manually disabled the check.
luigi
> (essentially a 4 processor system). As a result, the only HW interrupts in
> the system are hardclock (8254), the rtc, serial console and scsi. The
> synchronous interrupts are (8254 and rtc). When the system is hung, I have
> found that the ipending and iactive bits for the 8254 and rtc are set
> (meaning the interrupt is pending and active) although giant lock is not
> held and all processors are idle (and halted). This lead me to believe that
> somehow the ipending bit was set "just before" the last interrupt returned.
> The only way the system would be able to run that interrupt again is if
> another interrupt would run and it would notice that ipending is set, and it
> would run (an interrupt delay would be seen). In a non-polling system, I
> imagine the ethernet interrupts would wake it up. I believe I found a
> potential hole where this could happen.
>
> In i386/isa/ipl.s:
>
> #ifdef SMP
> cli /* early to prevent INT deadlock */
> doreti_next2:
> #endif
> movl %eax,%ecx
> notl %ecx /* set bit = unmasked level */
> #ifndef SMP
> cli
> #endif
> andl _ipending,%ecx /* set bit = unmasked pending INT */
> jne doreti_unpend
> movl %eax,_cpl
>
> I'm concerned in the instance the ipending is checked and deemed to be not
> set, but just after another interrupt occurs causing ipending to be set.
> Because CPL is not yet unmasked, that interrupt is not forwarded. In
> Particular, in i386/isa/apic_vector.s:
>
> 3: ; /* other cpu has isr lock */ \
> APIC_ITRACE(apic_itrace_noisrlock, irq_num, APIC_ITRACE_NOISRLOCK)
> ;\
> lock ; \
> orl $IRQ_BIT(irq_num), _ipending ; \
> testl $IRQ_BIT(irq_num), _cpl ; \
> jne 4f ; /* this INT masked */ \
> call forward_irq ; /* forward irq to lock holder */ \
> POP_FRAME ; /* and return */ \
> iret ; \
> ALIGN_TEXT ; \
>
> The check for _cpl occurs right after the ipending, thus causing a potential
> race for checking/modifying the cpl.
>
> One quick solution that I thought might correct this would be in ipl.s,
> right after modifying the cpl, recheck the ipending again to see if it
> changed, such as:
>
>
> #ifdef SMP
> cli /* early to prevent INT deadlock */
> doreti_next2:
> #endif
> movl %eax,%ecx
> notl %ecx /* set bit = unmasked level */
> #ifndef SMP
> cli
> #endif
> andl _ipending,%ecx /* set bit = unmasked pending INT */
> jne doreti_unpend
> movl %eax,_cpl
> andl _ipending,%ecx /* set bit = unmasked pending INT */
> jne doreti_unpend
>
>
> Any opinions/insight?
>
> thanks.
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
More information about the freebsd-hackers
mailing list