svn commit: r222942 - stable/8/sys/kern

John Baldwin jhb at
Fri Jun 10 19:03:18 UTC 2011

Author: jhb
Date: Fri Jun 10 19:03:17 2011
New Revision: 222942

  MFC 222254:
  Fix an issue with critical sections and SMP rendezvous handlers.
  Specifically, a critical_exit() call that drops the nesting level to zero
  has a brief window where the pending preemption flag is set and the
  nesting level is set to zero.  This is done purposefully to avoid races
  where a preemption scheduled by an interrupt could be lost otherwise (see
  revision 144777).  However, this does mean that if an interrupt fires
  during this window and enters and exits a critical section, it may preempt
  from the interrupt context.  This is generally fine as the interrupt code
  is careful to arrange critical sections so that they are not exited until
  it is safe to preempt (e.g. interrupts EOI'd and masked if necessary).
  However, the SMP rendezvous IPI handler does not quite follow this rule,
  and in general a rendezvous can never be preempted.  Rendezvous handlers
  are also not permitted to schedule threads to execute, so they will not
  typically trigger preemptions.  SMP rendezvous handlers may use
  spinlocks (carefully) such as the rm_cleanIPI() handler used in rmlocks,
  but using a spinlock also enters and exits a critical section.  If the
  interrupted top-half code is in the brief window of critical_exit() where
  the nesting level is zero but a preemption is pending, then releasing the
  spinlock can trigger a preemption.  Because we know that SMP rendezvous
  handlers can never schedule a thread, we know that a critical_exit() in
  an SMP rendezvous handler will only preempt in this edge case.  We also
  know that the top-half thread will happily handle the deferred preemption
  once the SMP rendezvous has completed, so the preemption will not be lost.
  This makes it safe to employ a workaround where we use a nested critical
  section in the SMP rendezvous code itself around rendezvous action
  routines to prevent any preemptions during an SMP rendezvous.  The
  workaround intentionally avoids checking for a deferred preemption
  when leaving the critical section on the assumption that if there is a
  pending preemption it will be handled by the interrupted top-half code.

Directory Properties:
  stable/8/sys/   (props changed)
  stable/8/sys/amd64/include/xen/   (props changed)
  stable/8/sys/cddl/contrib/opensolaris/   (props changed)
  stable/8/sys/contrib/dev/acpica/   (props changed)
  stable/8/sys/contrib/pf/   (props changed)

Modified: stable/8/sys/kern/subr_smp.c
--- stable/8/sys/kern/subr_smp.c	Fri Jun 10 18:58:48 2011	(r222941)
+++ stable/8/sys/kern/subr_smp.c	Fri Jun 10 19:03:17 2011	(r222942)
@@ -340,11 +340,15 @@ restart_cpus(cpumask_t map)
+	struct thread *td;
 	void *local_func_arg;
 	void (*local_setup_func)(void*);
 	void (*local_action_func)(void*);
 	void (*local_teardown_func)(void*);
 	int generation;
+	int owepreempt;
 	/* Ensure we have up-to-date values. */
 	atomic_add_acq_int(&smp_rv_waiters[0], 1);
@@ -359,6 +363,34 @@ smp_rendezvous_action(void)
 	generation = smp_rv_generation;
+	 * Use a nested critical section to prevent any preemptions
+	 * from occurring during a rendezvous action routine.
+	 * Specifically, if a rendezvous handler is invoked via an IPI
+	 * and the interrupted thread was in the critical_exit()
+	 * function after setting td_critnest to 0 but before
+	 * performing a deferred preemption, this routine can be
+	 * invoked with td_critnest set to 0 and td_owepreempt true.
+	 * In that case, a critical_exit() during the rendezvous
+	 * action would trigger a preemption which is not permitted in
+	 * a rendezvous action.  To fix this, wrap all of the
+	 * rendezvous action handlers in a critical section.  We
+	 * cannot use a regular critical section however as having
+	 * critical_exit() preempt from this routine would also be
+	 * problematic (the preemption must not occur before the IPI
+	 * has been acknowleged via an EOI).  Instead, we
+	 * intentionally ignore td_owepreempt when leaving the
+	 * critical setion.  This should be harmless because we do not
+	 * permit rendezvous action routines to schedule threads, and
+	 * thus td_owepreempt should never transition from 0 to 1
+	 * during this routine.
+	 */
+	td = curthread;
+	td->td_critnest++;
+	owepreempt = td->td_owepreempt;
+	/*
 	 * If requested, run a setup function before the main action
 	 * function.  Ensure all CPUs have completed the setup
 	 * function before moving on to the action function.
@@ -391,14 +423,18 @@ smp_rendezvous_action(void)
 	MPASS(generation == smp_rv_generation);
 	atomic_add_int(&smp_rv_waiters[2], 1);
-	if (local_teardown_func == smp_no_rendevous_barrier)
-                return;
-	while (smp_rv_waiters[2] < smp_rv_ncpus &&
-	    generation == smp_rv_generation)
-		cpu_spinwait();
+	if (local_teardown_func != smp_no_rendevous_barrier) {
+		while (smp_rv_waiters[2] < smp_rv_ncpus &&
+		    generation == smp_rv_generation)
+			cpu_spinwait();
+		if (local_teardown_func != NULL)
+			local_teardown_func(local_func_arg);
+	}
-	if (local_teardown_func != NULL)
-		local_teardown_func(local_func_arg);
+	td->td_critnest--;
+	KASSERT(owepreempt == td->td_owepreempt,
+	    ("rendezvous action changed td_owepreempt"));

More information about the svn-src-stable-8 mailing list