SMP system shutdown hang (acpi_cpu_shutdown - smp_rendezvous)

John Baldwin jhb at freebsd.org
Fri Nov 2 16:38:44 PDT 2007


On Friday 02 November 2007 12:01:35 am Nate Lawson wrote:
> Glen wrote:
> > Hi,
> > 
> > I have been seeing intermittent hangs in the acpi shutdown code on a
> > Intel 2.4GHz 8 CPU system. I am running a with a  Freebsd6.1 code base
> > but cannot see a reason why this can't happen in other Freebsd versions.
> > The hang is very irregular, I am recreating it using an expect script
> > that repeatedly reboots the system. Sometimes, I can do up to 200
> > reboots before observing the hang, sometimes, it happens after 5-20
> > reboots.
> > 
> > It has been difficult to pin down the hang as the system is not
> > responding to NMI events but using breakpoints I believe the hang is in 
> > acpi_cpu.c:acpi_cpu_shutdown with the call to smp_rendezvous.
> > 
> > My theory is that one of the CPUs does not respond to ipi_all_but_self
> > and that all the other CPUs are waiting for it in smp_rendezvous_action.
> > The smp_rv_waiters[0] < mp_ncpus condition never gets met and the system
> > hangs. This maybe happen due to other activity (or a deadlock?) on that
> > CPU.
> > 
> > I noticed a few threads relating to this and have already tried stuff
> > like changing kern.sched.ipiwakeup.enabled & machdep.cpu_idle_hlt.
> > Neither had any effect.
> > 
> > 1) I tried removing the call to smp_rendezvous in acpi_cpu_shutdown and
> > this stops the hang from happening. Does anyone know the purpose of this
> > call in the shutdown code or if I might suffer some consequence by
> > removing it?
> 
> I have one more thing I needed to consider.  There's a race where a
> thread could be entering acpi_cpu_idle() to read a C2-3 register but
> that register state gets destroyed with the softc before the read.  In
> that case, I thought there could be a panic, hence why I originally put
> in the smp_rendezvous().  However, I don't think device_shutdown() frees
> softcs (need to look in the newbus code to be sure).  So I still should
> be able to remove this code after checking more closely.

It does not.  Only detach does.

-- 
John Baldwin


More information about the freebsd-acpi mailing list