SMP system shutdown hang (acpi_cpu_shutdown - smp_rendezvous)

Glen glen.leeder at nokia.com
Thu Nov 1 17:28:15 PDT 2007


Hi,

I have been seeing intermittent hangs in the acpi shutdown code on a 
Intel 2.4GHz 8 CPU system. I am running a with a  Freebsd6.1 code base 
but cannot see a reason why this can't happen in other Freebsd versions. 
The hang is very irregular, I am recreating it using an expect script 
that repeatedly reboots the system. Sometimes, I can do up to 200 
reboots before observing the hang, sometimes, it happens after 5-20 reboots.

It has been difficult to pin down the hang as the system is not 
responding to NMI events but using breakpoints I believe the hang is in  
acpi_cpu.c:acpi_cpu_shutdown with the call to smp_rendezvous.

My theory is that one of the CPUs does not respond to ipi_all_but_self 
and that all the other CPUs are waiting for it in smp_rendezvous_action. 
The smp_rv_waiters[0] < mp_ncpus condition never gets met and the system 
hangs. This maybe happen due to other activity (or a deadlock?) on that CPU.

I noticed a few threads relating to this and have already tried stuff 
like changing kern.sched.ipiwakeup.enabled & machdep.cpu_idle_hlt. 
Neither had any effect.

1) I tried removing the call to smp_rendezvous in acpi_cpu_shutdown and 
this stops the hang from happening. Does anyone know the purpose of this 
call in the shutdown code or if I might suffer some consequence by 
removing it?

2) Has anyone got any suggestions for debugging this further given that 
I can't break into the debugger? I thought I could maybe instrument some 
counters in i386/i386/local_apic.c & kern_smp.c with the aim of 
identifying a root cause.

Glen


More information about the freebsd-acpi mailing list