smp_rendezvous runs with interrupts and preemption enabled on
unicore systems
John Baldwin
jhb at freebsd.org
Wed Nov 2 18:05:02 UTC 2011
On Monday, October 31, 2011 7:43:03 pm Attilio Rao wrote:
> 2011/10/28 <mdf at freebsd.org>:
> > On Fri, Oct 28, 2011 at 8:37 AM, Ryan Stone <rysto32 at gmail.com> wrote:
> >> I'm seeing issues on a unicore systems running a derivative of FreeBSD
> >> 8.2-RELEASE if something calls mem_range_attr_set. It turns out that
> >> the root cause is a bug in smp_rendezvous_cpus. The first part of
> >> smp_rendezvous_cpus attempts to short-circuit the non-SMP case(note
> >> that smp_started is never set to 1 on a unicore system):
> >>
> >> if (!smp_started) {
> >> if (setup_func != NULL)
> >> setup_func(arg);
> >> if (action_func != NULL)
> >> action_func(arg);
> >> if (teardown_func != NULL)
> >> teardown_func(arg);
> >> return;
> >> }
> >>
> >> The problem is that this runs with interrupts enabled, outside of a
> >> critical section. My system runs with device_polling enabled with hz
> >> set to 2500, so its quite easy to wedge the system by having a thread
> >> run mem_range_attr_set. That has to do a smp_rendezvous, and if a
> >> timer interrupt happens to go off half-way through the action_func and
> >> preempt this thread, the system ends up deadlocked(although once it's
> >> wedged, typing at the serial console stands a good chance of unwedging
> >> the system. Go figure).
>
> I'm not entirely sure why this exactly breaks though (do you see that
> happening with a random rendezvous callback or it is always the
> same?), because that just becames a simple function calling on cpu0,
> even if I think that there is still a bug as smp_rendezvous callbacks
> may expect to have interrupts and preemption disabled and the
> short-circuit breaks that assumption.
>
> >> I know that smp_rendezvous was reworked substantially on HEAD, but by
> >> inspection it looks like the bug is still present, as the
> >> short-circuit behaviour is still there.
> >>
> >> I am not entirely sure of the best way to fix this. Is it as simple
> >> as doing a spinlock_enter before setup_func and a spinlock_exit after
> >> teardown_func? It seems to boot fine, but I'm not at all confident
> >> that I understand the nuances of smp_rendezvous to be sure that there
> >> aren't side effects that I don't know about.
> >
> > Looks like Max didn't have time to upstream this fix:
> >
> > struct mtx smp_ipi_mtx;
> > +MTX_SYSINIT(smp_ipi_mtx, &smp_ipi_mtx, "smp rendezvous", MTX_SPIN);
> >
> > ...
> >
> > static void
> > mp_start(void *dummy)
> > {
> >
> > - mtx_init(&smp_ipi_mtx, "smp rendezvous", NULL, MTX_SPIN);
> >
> > ...
> >
> > if (!smp_started) {
> > + mtx_lock_spin(&smp_ipi_mtx);
> > if (setup_func != NULL)
> > setup_func(arg);
> > if (action_func != NULL)
> > action_func(arg);
> > if (teardown_func != NULL)
> > teardown_func(arg);
> > + mtx_unlock_spin(&smp_ipi_mtx);
> > return;
> > }
>
> I don't think that we strictly need the lock here, my preferred
> solution would be (only test-compiled):
> Index: sys/kern/subr_smp.c
> ===================================================================
> --- sys/kern/subr_smp.c (revision 226972)
> +++ sys/kern/subr_smp.c (working copy)
> @@ -415,13 +415,16 @@ smp_rendezvous_cpus(cpuset_t map,
> {
> int curcpumap, i, ncpus = 0;
>
> + /* Look comments in the !SMP case. */
> if (!smp_started) {
> + spinlock_enter();
> if (setup_func != NULL)
> setup_func(arg);
> if (action_func != NULL)
> action_func(arg);
> if (teardown_func != NULL)
> teardown_func(arg);
> + spinlock_exit();
> return;
> }
>
> @@ -666,12 +669,18 @@ smp_rendezvous_cpus(cpuset_t map,
> void (*teardown_func)(void *),
> void *arg)
> {
> + /*
> + * In the !SMP case we just need to ensure the same initial
conditions
> + * as the SMP case.
> + */
> + spinlock_enter();
> if (setup_func != NULL)
> setup_func(arg);
> if (action_func != NULL)
> action_func(arg);
> if (teardown_func != NULL)
> teardown_func(arg);
> + spinlock_exit();
> }
>
> void
> @@ -681,12 +690,15 @@ smp_rendezvous(void (*setup_func)(void *),
> void *arg)
> {
>
> + /* Look comments in the smp_rendezvous_cpus() case. */
> + spinlock_enter();
> if (setup_func != NULL)
> setup_func(arg);
> if (action_func != NULL)
> action_func(arg);
> if (teardown_func != NULL)
> teardown_func(arg);
> + spinlock_exit();
> }
I think this is fine, and probably correct.
--
John Baldwin
More information about the freebsd-current
mailing list