SCHED_ULE race condition, fix proposal

John Baldwin jhb at freebsd.org
Wed Jan 27 17:52:35 UTC 2016


On Wednesday, January 27, 2016 06:18:16 PM Wojciech Macek wrote:
> Hello,
> 
> I've encountered a very nasty race condition during debugging armv8 HWPMC.
> It seems that ULE scheduler can execute the same thread on two different
> CPUs at the same time...
> 
> Here is the scenario.
> The PMC driver must execute some of the code on the CPU0. To ensure that, a
> process migration is triggered as following:
> 
> 
> thread_lock(curthread);
> sched_bind(curthread, cpu);
> thread_unlock(curthread);
> 
> KASSERT(curthread->td_oncpu == cpu,
>  ("[pmc,%d] CPU not bound [cpu=%d, curr=%d]", __LINE__,
>  cpu, curthread->td_oncpu));
> 
> 
> That causes the context switch and (finally) execution of sched_switch()
> function. The code correctly detects migration and calls
> sched_switch_migrate. That function is supposed to add current thread to
> the runqueue of another CPU ("tdn" variable). So it does:
> 
> tdq_lock_pair(tdn, tdq);
> tdq_add(tdn, td, flags);
> tdq_notify(tdn, td);
> TDQ_UNLOCK(tdn);
> spinlock_exit();
> 
> 
> But that sometimes is causing a crash, because the other CPU is staring to
> process mi_switch as soon as the IPI arrives (via tdq_notify) and the
> runqueue lock is released. The problem is, that the thread does not contain
> valid register set, because its context was not yet stored - that happens
> later in machine dependent cpu_switch function. In another words, the
> sched_switch run on the CPU we want the thread to migrate onto restores
> thread context before it was actually stored on another core - that causes
> setting regs/pc/lt to some junk data and crash.
> 
> 
> I'd like to discuss a possible solution for this. I think it would be
> reasonable to extend cpu_switch to be capable of releasing a lock as the
> last thing it does after storing everything into the PCB. We could then
> remove the "TDQ_UNLOCK(tdn);" from the sched_switch_migrate and be sure
> that in the situation of migration nobody is allowed to touch the target
> runqueue until the migrating process finishes storing its context.
> 
> But first I'd like to discuss some possible alternatives and maybe find
> another solution, because any change in this area will impact all supported
> architectures.

This belongs on hackers, not developers at .

cpu_switch() already does what you describe though in a slightly different
way.  The thread_lock() of a thread being switched out is set to blocked_lock.
cpu_switch() on the new CPU will always spin until cpu_switch updates
thread_lock of the old thread to point to the proper runq lock after saving
its state in the pcb.  arm64 does this here:

        /*
         * Release the old thread. This doesn't need to be a store-release
         * as the above dsb instruction will provide release semantics.
         */
        str     x2, [x0, #TD_LOCK]
#if defined(SCHED_ULE) && defined(SMP)
        /* Read the value in blocked_lock */
        ldr     x0, =_C_LABEL(blocked_lock)
        ldr     x2, [x0]
1:
        ldar    x3, [x1, #TD_LOCK]
        cmp     x3, x2
        b.eq    1b
#endif

Note the thread_lock_block() call just above the block you noted from
sched_switch_migrate() to see where td_lock is set to &blocked_lock.

If the comment about 'dsb' above is wrong that might explain why you see
stale state in the PCB after seeing the new value of td_lock.

-- 
John Baldwin


More information about the freebsd-arm mailing list