sched_pin() bug in SCHED_ULE
John Baldwin
jhb at freebsd.org
Wed Sep 1 13:50:02 UTC 2010
On Tuesday, August 31, 2010 2:53:12 pm mdf at freebsd.org wrote:
> On Tue, Aug 31, 2010 at 10:16 AM, <mdf at freebsd.org> wrote:
> > I recorded the stack any time ts->ts_cpu was set and when a thread was
> > migrated by sched_switch() I printed out the recorded info. Here's
> > what I found:
> >
> >
> > XXX bug 67957: moving 0xffffff003ff9b800 from 3 to 1
> > [1]: pin 0 state 4 move 3 -> 1 done by 0xffffff000cc44000:
> > #0 0xffffffff802b36b4 at bug67957+0x84
> > #1 0xffffffff802b5dd4 at sched_affinity+0xd4
> > #2 0xffffffff8024a707 at cpuset_setthread+0x137
> > #3 0xffffffff8024aeae at cpuset_setaffinity+0x21e
> > #4 0xffffffff804a82df at freebsd32_cpuset_setaffinity+0x4f
> > #5 0xffffffff80295f49 at isi_syscall+0x99
> > #6 0xffffffff804a630e at ia32_syscall+0x1ce
> > #7 0xffffffff8046dc60 at Xint0x80_syscall+0x60
> > [0]: pin 0 state 2 move 0 -> 3 done by 0xffffff000cc44000:
> > #0 0xffffffff802b36b4 at bug67957+0x84
> > #1 0xffffffff802b4ad8 at sched_add+0xe8
> > #2 0xffffffff8029b96a at create_thread+0x34a
> > #3 0xffffffff8029badc at kern_thr_new+0x8c
> > #4 0xffffffff804a8912 at freebsd32_thr_new+0x122
> > #5 0xffffffff80295f49 at isi_syscall+0x99
> > #6 0xffffffff804a630e at ia32_syscall+0x1ce
> > #7 0xffffffff8046dc60 at Xint0x80_syscall+0x60
> >
> > So one thread in the process called cpuset_setaffinity(2), and another
> > thread in the process was forcibly migrated by the IPI while returning
> > from a syscall, while it had td_pinned set.
> >
> > Given this path, it seems reasonable to me to skip the migrate if we
> > notice THREAD_CAN_MIGRATE is false.
> >
> > Opinions? My debug code is below. I'll try to write a short testcase
> > that exhibits this bug.
>
> Just a few more thoughts on this. The check in sched_affinity for
> THREAD_CAN_MIGRATE is racy. Since witness uses sched_pin, it's not
> simple to take the THREAD lock around an increment of td_pinned. So
> I'm looking for suggestions on the best way to fix this issue. My
> thoughts:
>
> 1) add a check in sched_switch() for THREAD_CAN_MIGRATE
> 2) have WITNESS not use sched_pin, and take the THREAD lock when
> modifying td_pinned
> 3) have the IPI_PREEMPT handler notice the thread is pinned (and not
> trying to bind) and postpone the mi_switch, just like it postpones
> when a thread is in a critical section.
>
> Except for the potential complexity of implementation, I think (2) is
> the best solution.
I don't like 2) only because you'd really need to fix all the other places
that use sched_pin() to grab the thread lock. That would be a bit expensive.
I think the problem is code that checks THREAD_CAN_MIGRATE() for running
threads that aren't curthread.
The sched_affinity() bug is probably my fault FWIW. I think something along
the lines of 1) is the best approach. Maybe something like the patch below.
It moves the CPU assignment out of sched_affinity() and moves it into
sched_switch() instead so that it takes effect on the next context switch for
this thread. That is the point at which the CPU binding actually takes effect
anyway since sched_affinity() doesn't force an immediate context switch. This
patch is relative to 7. The patch for 9 is nearly identical (just a fixup for
ipi_cpu() vs ipi_selected()).
Index: sched_ule.c
===================================================================
--- sched_ule.c (revision 212065)
+++ sched_ule.c (working copy)
@@ -1885,6 +1885,8 @@
srqflag = (flags & SW_PREEMPT) ?
SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED :
SRQ_OURSELF|SRQ_YIELDING;
+ if (THREAD_CAN_MIGRATE(td) && !THREAD_CAN_SCHED(td, ts->tscpu))
+ ts->ts_cpu = sched_pickcpu(td, 0);
if (ts->ts_cpu == cpuid)
tdq_add(tdq, td, srqflag);
else
@@ -2536,7 +2538,6 @@
{
#ifdef SMP
struct td_sched *ts;
- int cpu;
THREAD_LOCK_ASSERT(td, MA_OWNED);
ts = td->td_sched;
@@ -2550,17 +2551,13 @@
if (!TD_IS_RUNNING(td))
return;
td->td_flags |= TDF_NEEDRESCHED;
- if (!THREAD_CAN_MIGRATE(td))
- return;
/*
- * Assign the new cpu and force a switch before returning to
- * userspace. If the target thread is not running locally send
- * an ipi to force the issue.
+ * Force a switch before returning to userspace. If the
+ * target thread is not running locally send an ipi to force
+ * the issue.
*/
- cpu = ts->ts_cpu;
- ts->ts_cpu = sched_pickcpu(td, 0);
- if (cpu != PCPU_GET(cpuid))
- ipi_selected(1 << cpu, IPI_PREEMPT);
+ if (td != curthread)
+ ipi_selected(1 << ts->ts_cpu, IPI_PREEMPT);
#endif
}
--
John Baldwin
More information about the freebsd-current
mailing list