stop_cpus*() interface
John Baldwin
jhb at freebsd.org
Thu Jun 23 12:51:59 UTC 2011
On Wednesday, June 22, 2011 12:26:11 pm Andriy Gapon wrote:
>
> I would like to propose to narrow stop_cpus*() interface:
>
> 1. Remove cpu mask/set parameter. Rationale for this is presented below in a
> forwarded message from a private discussion. You may also see that currently
> stop_cpus*() functions are always called with either (1) other_cpus mask or (2)
> other_cpus & ~stopped_cpus mask, where (2) is really equivalent to (1) because of (1).
>
> 2. Change return type to void. Currently return value of stop_cpus*() is never
> handled and it can not be really handled meaningfully. Simple boolean or errno
> return value can not convey which target CPUs were already stopped and which
> failed to become stopped and why. I think that it's better to assume that
> stop_cpus*() should never fail and add necessary diagnostics to catch cases where
> it does fail.
>
> The below forwarded message provides my thoughts on CPU stopping semantics and
> additionally presents my analysis of CPU stopping code in OpenSolaris.
>
> -------- Original Message --------
> on 12/05/2011 21:17 Andriy Gapon said the following:
> > cpu_hard_stop does stop other CPUs in a hard way. At least on some archs it is
> > really so, e.g. x86 NMI. This means that stopped CPUs, rather threads that were
> > running on them, can be stopped in any kinds of contexts with any kinds of locks
> > held, including spinlocks. Given that fact, it is really unsafe to continue
> > using any locks after even one CPU is hard-stopped. So any remaining running
> > CPUs should be put into a special non-locking mode. This is the reason that we
> > invent things like THREAD_PANICED() and use polling mode in kdb context, etc.
> > But having more than one CPU, in fact even more than one thread, running in
> > non-locking mode is unsafe again - if those CPUs continue execution without any
> > synchronization, then they would corrupt shared data.
> > Thus, I argue that hard stopping should leave only one CPU and thread running.
>
> Some more thoughts.
>
> I think that the above reasoning does even apply to the current soft stopping to
> a certain degree. Soft stopping would not leave any spinlocks held, true, but
> it can still leave other kinds of locks held, e.g. regular mutexes, sx locks.
> And that also produces a very special environment in the end.
> So in my opinion current soft stopping should also always stop all other CPUs.
>
> I think that eventually we will need "really soft" graceful stopping mechanism.
> That mechanism would rebind all interrupts away from a CPU being stopped, would
> migrate all (non-special) threads away from the CPU, would instruct scheduler to
> not run any threads on the CPU, would remove it from any active CPU sets, etc.
> Now, this mechanism should really be of a targeted variety, no doubt.
>
>
> I also would like to share some of my observations of OpenSolaris code.
> This is not to try to give any support to my proposals - after all we are not
> Solaris, but FreeBSD - but simply to share some ideas.
>
> In OpenSolaris I've noticed three separate CPU stopping mechanisms so far. I am
> sure that they have more :-)
>
> 1. Stopping by debugger. This is very similar to our hard stopping (in their
> x86 code[*]). All other CPUs are always stopped. One difference is that the
> stopped CPUs run a special command loop while spinning. The master CPU can send
> a few commands to the slave CPUs. Examples: the master can tell a slave, if
> it's a BSP, to reset a system; the master can tell a slave to become a new
> master (I think that this is somewhat equivalent to "thread N" command in gdb).
> All commands:
> #define KMDB_DPI_CMD_RESUME_ALL 1 /* Resume all CPUs */
> #define KMDB_DPI_CMD_RESUME_MASTER 2 /* Resume only master CPU */
> #define KMDB_DPI_CMD_RESUME_UNLOAD 3 /* Resume for debugger unload */
> #define KMDB_DPI_CMD_SWITCH_CPU 4 /* Switch to another CPU */
> #define KMDB_DPI_CMD_FLUSH_CACHES 5 /* Flush slave caches */
> #define KMDB_DPI_CMD_REBOOT 6 /* Reboot the machine */
>
>
> 2. Stopping for panic. This is very similar to our hard stopping (in their x86
> code[*]). All other CPUs are always stopped. But this is done via different
> code than what debugger does, I am not sure why, maybe some historic legacy.
> The difference from our code and the debugger code is the stopped CPUs run a
> different stop loop and may do some useful panic work. E.g. my understanding is
> that they can be used for compressing a dump image (yes, they compress their dumps
> for disk writing speed I guess).
>
> 3. Something remotely similar to our current soft stopping. Big difference is
> that they have special "pause" threads per cpu. This mechanism activates those
> threads, the threads make themselves non-preemptable, disable interrupts and
> block on some sort of a semaphore until they are told to resume. Not sure what
> advantage, if any, this mechanism gives them comparing to our approach.
> The mechanism is invoked via pause_cpus() call. It is used mainly to change
> state of CPUs (some per-CPU data), like e.g. configuring idle hooks, power
> management.
>
> [!] BTW, they also use this mechanism when onlining/offlining CPUs to avoid
> locking in normal paths. That is, for instance, they stop/pause all CPUs, mark
> a target CPU as offline, and then restart all CPUs. This way they don't need
> any locking when checking (and changing) CPU status. Of course, they also do
> all the reasonable things to do - unbinding interrupts, moving away treads, etc.
> The mechanism is also used for their checkpoint-resume code (which is used by
> suspend/resume) and in their shutdown/reboot path.
> This CPU stopping mechanism also always stops all other CPUs.
>
>
> [*] Another difference to note is that they don't use NMI for their equivalents
> of our hard stopping. They still have the notion of interrupt levels and
> various spl* stuff. So they just have a normal interrupt with highest priority
> to penetrate protected contexts. E.g. in their equivalent of spinlock_enter()
> they do not outright disable interrupts, but set current level to a special
> 'LOCK' level which inhibits all typical (hardware and IPI) interrupts. This
> mechanism adds another degree of freedom to their implementation, as such it
> complicates code and logic, but also adds some flexibility.
>
> I hope that there is something useful for you and FreeBSD in this lengthy overview.
I really like the OpenSolaris model. You could perhaps merge 1) and 2) it
sounds like. The pause thread idea for handling online/offline is quite
nice.
On x86 you could have IPI_STOP be non-NMI if we adjusted the TPR (%cr8 on
amd64) instead of using cli/sti for spinlock_enter/exit. However, older i386
CPUs do not support this, so I think this is only practical on amd64 if we
were to go that route. OTOH, I think using an NMI is actually fine (though
we need to do a better job of providing a way to register NMI handlers
instead of the various hacks we currently have).
--
John Baldwin
More information about the freebsd-arch
mailing list