SMP system shutdown hang (acpi_cpu_shutdown - smp_rendezvous)

Glen glen.leeder at nokia.com
Thu Nov 1 18:35:33 PDT 2007


ext Nate Lawson wrote:
> Glen wrote:
>   
>> Hi,
>>
>> I have been seeing intermittent hangs in the acpi shutdown code on a
>> Intel 2.4GHz 8 CPU system. I am running a with a  Freebsd6.1 code base
>> but cannot see a reason why this can't happen in other Freebsd versions.
>> The hang is very irregular, I am recreating it using an expect script
>> that repeatedly reboots the system. Sometimes, I can do up to 200
>> reboots before observing the hang, sometimes, it happens after 5-20
>> reboots.
>>
>> It has been difficult to pin down the hang as the system is not
>> responding to NMI events but using breakpoints I believe the hang is in 
>> acpi_cpu.c:acpi_cpu_shutdown with the call to smp_rendezvous.
>>     
>
> First, thank you for your careful debugging help.  This is wonderful.
>
>   
>> My theory is that one of the CPUs does not respond to ipi_all_but_self
>> and that all the other CPUs are waiting for it in smp_rendezvous_action.
>> The smp_rv_waiters[0] < mp_ncpus condition never gets met and the system
>> hangs. This maybe happen due to other activity (or a deadlock?) on that
>> CPU.
>>
>> I noticed a few threads relating to this and have already tried stuff
>> like changing kern.sched.ipiwakeup.enabled & machdep.cpu_idle_hlt.
>> Neither had any effect.
>>     
>
> Very interesting.  I didn't think anything could cause an IPI not to get
> delivered eventually but during shutdown interrupts may be disabled at
> some point.
>
>   

It was only a theory; I couldn't think of any other reasons why one of 
the CPUs doesn't rendezvous, interrupts being disabled  is a good reason 
though!

>> 1) I tried removing the call to smp_rendezvous in acpi_cpu_shutdown and
>> this stops the hang from happening. Does anyone know the purpose of this
>> call in the shutdown code or if I might suffer some consequence by
>> removing it?
>>     
>
> Yes, I put it in to break all APs out of their potential C1-3 sleep.
> This way they are not halted when shutdown needs to synchronize and stop
> them.  But that code sends its own IPI so there is no reason to do it
> again here.  I will remove smp_rendezvous() now.
>   

It sounds like removing smp_rendezvous is a safe thing to do, thanks for 
your insight.

>   
>> 2) Has anyone got any suggestions for debugging this further given that
>> I can't break into the debugger? I thought I could maybe instrument some
>> counters in i386/i386/local_apic.c & kern_smp.c with the aim of
>> identifying a root cause.
>>     
>
> Sounds reasonable.  Thanks again for a detailed problem report.
>
>   

I will notify the list if I find anything further regarding this 
problem. Thanks for your response.


More information about the freebsd-acpi mailing list