can the scheduler decide to schedule an interrupted but runnable thread on another CPU core? What are the implications for code?

Adrian Chadd adrian at freebsd.org
Thu Feb 13 23:57:34 UTC 2014


Hi,

Whilst digging into collisions in the flowtable code I discovered that
a bunch of them are due to some of the flowtable_lookup() code running
on a different CPU - the contention on the l2/l3 lookup lock(s) was
enough to block things so they'd get an obvious chance to be migrated.

So this led me to wonder whether in a fully preemptive kernel, a
running kernel thread would stay on the current CPU until it hit a
very specific subset of things (exited to userland, hit a lock, etc.)

Apparently (according to kan and rwatson) this is how we define fully
preemptive - it's not just interruptable at almost any point, but the
running task may be interrupted and rescheduled on a different CPU
outside of specific critical sections.

This means that for the flowtable case, the current setup (without
atomics to maintain the lists) can only work if all threads doing work
with the flowtable structures (ie, lookup, insert, purge) have to be
CPU pinned. Otherwise we may have the situation where:

sequentually:

* lookup occurs on CPU A;
* lookup succeeds on CPU A for some almost-expired entry;
* preemption occurs, and it gets scheduled to CPU B;

then simultaneously:

* CPU A's flowtable purge code runs, and decides to purge entries
including the current one;
* the code now running on CPU B has an item from the CPU A flowtable,
and dereferences it as it's being freed, leading to potential badness.

Now, it's a ridiculously small window of opportunity, but I'd rather
the code be written to be correct and mostly-fast versus very fast and
potentially exploding. I'm sure those in operations would agree. :-)

So, my questions:

* is this actually how fully pre-emptive kernels _may_ behave?
* I believe there's a difference between what 4BSD and ULE will do
here - is this the case? What are the scheduler behaviours?
* are there any other areas in the kernel that rely on pcpu uma zones
/ curcpu indexes for things outside of sched_pin() ?

Thanks,



-a


More information about the freebsd-arch mailing list