svn commit: r202889 - head/sys/kern

Tue Jan 26 08:58:46 UTC 2010

2010/1/25 John Baldwin <jhb at freebsd.org>:
> On Saturday 23 January 2010 10:54:22 am Attilio Rao wrote:
>> Author: attilio
>> Date: Sat Jan 23 15:54:21 2010
>> New Revision: 202889
>> URL: http://svn.freebsd.org/changeset/base/202889
>>
>> Log:
>>   - Fix a race in sched_switch() of sched_4bsd.
>>     In the case of the thread being on a sleepqueue or a turnstile, the
>>     sched_lock was acquired (without the aid of the td_lock interface) and
>>     the td_lock was dropped. This was going to break locking rules on other
>>     threads willing to access to the thread (via the td_lock interface) and
>>     modify his flags (allowed as long as the container lock was different
>>     by the one used in sched_switch).
>>     In order to prevent this situation, while sched_lock is acquired there
>>     the td_lock gets blocked. [0]
>>   - Merge the ULE's internal function thread_block_switch() into the global
>>     thread_lock_block() and make the former semantic as the default for
>>     thread_lock_block(). This means that thread_lock_block() will not
>>     disable interrupts when called (and consequently thread_unlock_block()
>>     will not re-enabled them when called). This should be done manually
>>     when necessary.
>>     Note, however, that ULE's thread_unblock_switch() is not reaped
>>     because it does reflect a difference in semantic due in ULE (the
>>     td_lock may not be necessarilly still blocked_lock when calling this).
>>     While asymmetric, it does describe a remarkable difference in semantic
>>     that is good to keep in mind.
>
> Does this affect the various #ifdef's for handling the third argument to
> cpu_switch()?  E.g. does 4BSD need to spin if td_lock is &blocked_lock?
>
> Also, BLOCK_SPIN() on x86 is non-optimal.  It should not do cmpxchg in a loop.
> Instead, it should do cmp in a loop, and if the cmp succeeds, then try
> cmpxchg.

[CC'ing imp@ because he made a similar question privately]

Basically it is recounduited to the difference, in terms of runqueue
between sched_4bsd and sched_ule.
sched_ule has one runqueue per-core and sched_4bsd has just one runqueue.
The codepath you are referring to, for the ULE case, is used when
threads need to migrate from one cpu to another.
What should really happen is that threads should be both locked (thus
the runqueues where the in & out threads are present should be both
locked) but it will create cyclic LOR and then a 'blocked_lock' is
used (in order to avoid such LORs).
When you are switching in a thread caming from another CPU you need to
make sure its lock already changed to, possibly, the new runqueue one
(by the CPU where it *was* running that is going to switch it out).
That happens because we have not a global interlock between the CPUs
and thread migration gets a bit trickier.

Said that, the !SMP case just assume there is one runqueue, so this
discussion is not useful (you won't have different runqueues from
where the thread may be caming, but just one protected by sched_lock).
Also sched_4bsd has just one runqueue, thus this discussion should not
apply there.

What the last 4bsd patch does is to keep track of the 'old'
thread_lock and assign it to the switched out thread, which may now be
blocked, when it is safe to do that (after the tlb shootdown in
ia32/amd64 cpu_switch() case).
However, this may generate a bit of confusion, because 'blocked' has a
different meaning between the 2 cases:
* ULE does that in order to prevent LOR between runqueues CPUs.
* 4BSD now does that in order to be able to not end with 2 spinlocks
held in some cases (note: this is not a LOR because there is the well
defined order of sq_lock/ts_lock -> sched_lock) but it makes
cpu_switch() simpler, faster and reduces contention.
It is enough that cpu_switch() does handle the third argument properly
and sets back the old lock when it is safe.
I see, however, that this may be a problem for architectures that were
just supporting 4BSD and not ULE (at least to some extent), like in
the sparc64 case. If an architecture does not ignore the third
argument of cpu_switch() and does set it properly there should be a
problem with sched_4bsd. I should probabilly poke the maintainers of
any arch and check with them if there are further problems.

Finally, I really don't think the BLOCK_SPIN() performance improvement
you suggest should really make a difference (assuming how rarely it
should happen) because, please note, that the cmpxchg is necessary as
we need to enforce a memory barrier when doing the check in anyway (so
the first case should be however an atomic_cmpset_X_Y()).

Last thinking: it is not a very good idea cpu_switch() is made
conditional in regard of schedulers, but there is not an easy way to
do that as long as the safe points for releasing blocked_lock happens
within cpu_switch(). I think that is one of the reasons why some
people (maybe you or Jeff) pushed for having cpu_switch() made in C.

Attilio

-- 
Peace can only be achieved by understanding - A. Einstein