stop_cpus*() interface

Thu Jun 23 12:51:59 UTC 2011

On Wednesday, June 22, 2011 12:26:11 pm Andriy Gapon wrote:
> 
> I would like to propose to narrow stop_cpus*() interface:
> 
> 1. Remove cpu mask/set parameter.  Rationale for this is presented below in a
> forwarded message from a private discussion.  You may also see that currently
> stop_cpus*() functions are always called with either (1) other_cpus mask or (2)
> other_cpus & ~stopped_cpus mask, where (2) is really equivalent to (1) because of (1).
> 
> 2. Change return type to void.  Currently return value of stop_cpus*() is never
> handled and it can not be really handled meaningfully.  Simple boolean or errno
> return value can not convey which target CPUs were already stopped and which
> failed to become stopped and why.  I think that it's better to assume that
> stop_cpus*() should never fail and add necessary diagnostics to catch cases where
> it does fail.
> 
> The below forwarded message provides my thoughts on CPU stopping semantics and
> additionally presents my analysis of CPU stopping code in OpenSolaris.
> 
> -------- Original Message --------
> on 12/05/2011 21:17 Andriy Gapon said the following:
> > cpu_hard_stop does stop other CPUs in a hard way.  At least on some archs it is
> > really so, e.g. x86 NMI.  This means that stopped CPUs, rather threads that were
> > running on them, can be stopped in any kinds of contexts with any kinds of locks
> > held, including spinlocks.  Given that fact, it is really unsafe to continue
> > using any locks after even one CPU is hard-stopped.  So any remaining running
> > CPUs should be put into a special non-locking mode.  This is the reason that we
> > invent things like THREAD_PANICED() and use polling mode in kdb context, etc.
> > But having more than one CPU, in fact even more than one thread, running in
> > non-locking mode is unsafe again - if those CPUs continue execution without any
> > synchronization, then they would corrupt shared data.
> > Thus, I argue that hard stopping should leave only one CPU and thread running.
> 
> Some more thoughts.
> 
> I think that the above reasoning does even apply to the current soft stopping to
> a certain degree.  Soft stopping would not leave any spinlocks held, true, but
> it can still leave other kinds of locks held, e.g. regular mutexes, sx locks.
> And that also produces a very special environment in the end.
> So in my opinion current soft stopping should also always stop all other CPUs.
> 
> I think that eventually we will need "really soft" graceful stopping mechanism.
> That mechanism would rebind all interrupts away from a CPU being stopped, would
> migrate all (non-special) threads away from the CPU, would instruct scheduler to
> not run any threads on the CPU, would remove it from any active CPU sets, etc.
> Now, this mechanism should really be of a targeted variety, no doubt.
> 
> 
> I also would like to share some of my observations of OpenSolaris code.
> This is not to try to give any support to my proposals - after all we are not
> Solaris, but FreeBSD - but simply to share some ideas.
> 
> In OpenSolaris I've noticed three separate CPU stopping mechanisms so far.  I am
> sure that they have more :-)
> 
> 1. Stopping by debugger.  This is very similar to our hard stopping (in their
> x86 code[*]).  All other CPUs are always stopped.  One difference is that the
> stopped CPUs run a special command loop while spinning.  The master CPU can send
> a few commands to the slave CPUs.  Examples:  the master can tell a slave, if
> it's a BSP, to reset a system; the master can tell a slave to become a new
> master (I think that this is somewhat equivalent to "thread N" command in gdb).
> All commands:
> #define     KMDB_DPI_CMD_RESUME_ALL         1       /* Resume all CPUs */
> #define     KMDB_DPI_CMD_RESUME_MASTER      2       /* Resume only master CPU */
> #define     KMDB_DPI_CMD_RESUME_UNLOAD      3       /* Resume for debugger unload */
> #define     KMDB_DPI_CMD_SWITCH_CPU         4       /* Switch to another CPU */
> #define     KMDB_DPI_CMD_FLUSH_CACHES       5       /* Flush slave caches */
> #define     KMDB_DPI_CMD_REBOOT             6       /* Reboot the machine */
> 
> 
> 2. Stopping for panic.  This is very similar to our hard stopping (in their x86
> code[*]).  All other CPUs are always stopped.   But this is done via different
> code than what debugger does, I am not sure why, maybe some historic legacy.
> The difference from our code and the debugger code is the stopped CPUs run a
> different stop loop and may do some useful panic work.  E.g. my understanding is
> that they can be used for compressing a dump image (yes, they compress their dumps
> for disk writing speed I guess).
> 
> 3. Something remotely similar to our current soft stopping.  Big difference is
> that they have special "pause" threads per cpu.  This mechanism activates those
> threads, the threads make themselves non-preemptable, disable interrupts and
> block on some sort of a semaphore until they are told to resume.  Not sure what
> advantage, if any, this mechanism gives them comparing to our approach.
> The mechanism is invoked via pause_cpus() call.  It is used mainly to change
> state of CPUs (some per-CPU data), like e.g. configuring idle hooks, power
> management.
> 
> [!] BTW, they also use this mechanism when onlining/offlining CPUs to avoid
> locking in normal paths.  That is, for instance, they stop/pause all CPUs, mark
> a target CPU as offline, and then restart all CPUs.  This way they don't need
> any locking when checking (and changing) CPU status.  Of course, they also do
> all the reasonable things to do - unbinding interrupts, moving away treads, etc.
> The mechanism is also used for their checkpoint-resume code (which is used by
> suspend/resume) and in their shutdown/reboot path.
> This CPU stopping mechanism also always stops all other CPUs.
> 
> 
> [*] Another difference to note is that they don't use NMI for their equivalents
> of our hard stopping.  They still have the notion of interrupt levels and
> various spl* stuff.  So they just have a normal interrupt with highest priority
> to penetrate protected contexts.  E.g. in their equivalent of spinlock_enter()
> they do not outright disable interrupts, but set current level to a special
> 'LOCK' level which inhibits all typical (hardware and IPI) interrupts.  This
> mechanism adds another degree of freedom to their implementation, as such it
> complicates code and logic, but also adds some flexibility.
> 
> I hope that there is something useful for you and FreeBSD in this lengthy overview.

I really like the OpenSolaris model.  You could perhaps merge 1) and 2) it
sounds like.  The pause thread idea for handling online/offline is quite
nice.

On x86 you could have IPI_STOP be non-NMI if we adjusted the TPR (%cr8 on
amd64) instead of using cli/sti for spinlock_enter/exit.  However, older i386
CPUs do not support this, so I think this is only practical on amd64 if we
were to go that route.  OTOH, I think using an NMI is actually fine (though
we need to do a better job of providing a way to register NMI handlers
instead of the various hacks we currently have).

-- 
John Baldwin