svn commit: r202889 - head/sys/kern

Attilio Rao attilio at freebsd.org
Sat Jan 30 15:22:49 UTC 2010


2010/1/30 Marius Strobl <marius at alchemy.franken.de>:
> On Thu, Jan 28, 2010 at 11:16:55AM +0100, Attilio Rao wrote:
>> 2010/1/27 Marius Strobl <marius at alchemy.franken.de>:
>> > On Tue, Jan 26, 2010 at 08:10:25AM +0100, Attilio Rao wrote:
>> >> 2010/1/26 Rob Farmer <rfarmer at predatorlabs.net>:
>> >> > On Sat, Jan 23, 2010 at 7:54 AM, Attilio Rao <attilio at freebsd.org> wrote:
>> >> >> Author: attilio
>> >> >> Date: Sat Jan 23 15:54:21 2010
>> >> >> New Revision: 202889
>> >> >> URL: http://svn.freebsd.org/changeset/base/202889
>> >> >>
>> >> >> Log:
>> >> >>  - Fix a race in sched_switch() of sched_4bsd.
>> >> >>    In the case of the thread being on a sleepqueue or a turnstile, the
>> >> >>    sched_lock was acquired (without the aid of the td_lock interface) and
>> >> >>    the td_lock was dropped. This was going to break locking rules on other
>> >> >>    threads willing to access to the thread (via the td_lock interface) and
>> >> >>    modify his flags (allowed as long as the container lock was different
>> >> >>    by the one used in sched_switch).
>> >> >>    In order to prevent this situation, while sched_lock is acquired there
>> >> >>    the td_lock gets blocked. [0]
>> >> >>  - Merge the ULE's internal function thread_block_switch() into the global
>> >> >>    thread_lock_block() and make the former semantic as the default for
>> >> >>    thread_lock_block(). This means that thread_lock_block() will not
>> >> >>    disable interrupts when called (and consequently thread_unlock_block()
>> >> >>    will not re-enabled them when called). This should be done manually
>> >> >>    when necessary.
>> >> >>    Note, however, that ULE's thread_unblock_switch() is not reaped
>> >> >>    because it does reflect a difference in semantic due in ULE (the
>> >> >>    td_lock may not be necessarilly still blocked_lock when calling this).
>> >> >>    While asymmetric, it does describe a remarkable difference in semantic
>> >> >>    that is good to keep in mind.
>> >> >>
>> >> >>  [0] Reported by:      Kohji Okuno
>> >> >>                        <okuno dot kohji at jp dot panasonic dot com>
>> >> >>  Tested by:            Giovanni Trematerra
>> >> >>                        <giovanni dot trematerra at gmail dot com>
>> >> >>  MFC:                  2 weeks
>> >> >>
>> >> >> Modified:
>> >> >>  head/sys/kern/kern_mutex.c
>> >> >>  head/sys/kern/sched_4bsd.c
>> >> >>  head/sys/kern/sched_ule.c
>> >> >
>> >> > Hi,
>> >> >
>> >> > This commit seems to be causing me a kernel panic on sparc64 - details
>> >> > are in PR 143215. Could you take a look before MFCing this?
>> >>
>> >> I think that the bug may be in cpu_switch() where the mutex parameter
>> >> for sched_4bsd is not handled correctly.
>> >> Does sparc64 support ULE? I don't think it does and I think that it
>> >> simply ignores the third argument of cpu_switch() which is vital now
>> >> for for sched_4bsd too (what needs to happen is to take the passed
>> >> mutex and to set the TD_LOCK of old thread to be the third argument).
>> >> Unluckilly, I can't do that in sparc64 asm right now, but it should
>> >> not be too difficult to cope with it.
>> >>
>> >
>> > The following patch adds handling of the mutex parameter to the
>> > sparc64 cpu_switch():
>> > http://people.freebsd.org/~marius/sparc64_cpu_switch_mtx.diff
>> > This patch works fine with r202888. With r202889 it allows the
>> > machine to boot again, however putting some load on the machine
>> > causes it to issue a reset without a chance to debug. I've also
>> > tried with some variations like duplicating the old cpu_switch()
>> > for cpu_throw() so the altered cpu_switch() doesn't need to
>> > distinguish between the to cases and only assigning old->td_lock
>> > right before return but nothing made a difference. Given that
>> > this leaves little room for a bug in the cpu_switch() changes I
>> > suspect r202889 also breaks additional assumptions. For example
>> > the sparc64 pmap code used sched_lock, does that need to change
>> > to be td_lock now maybe? Is there anything else that comes to
>> > your mind in this regard?
>>
>> Sorry for being lame with sparc64 assembly (so that I can't make much
>> more productive help here), but the required patch, sched_4bsd only,
>> should simply save the extra-argument of cpu_switch() (and cpu_throw()
>> is not involved, so I'm not sure what is changing there) and move in
>> TD_LOCK(%oldthreadreg) when it is safe to do (just after the oldthread
>> switched out completely). It doesn't even require a memory barrier.
>> This patch seems a bit too big and I wonder what else it does (or I'm
>> entirely wrong and that's just what I asked here), maybe adding the
>> ULE support as well?
>
> Actually it just adds old->td_lock = mtx in a non-atomic fashion
> as soon as we're done with the old thread. It's "big" as I had to
> reshuffle the register usage in order to preserve %i0 (old) and
> %i2 (mtx) and in order to distinguish between cpu_switch() and
> cpu_throw() (no mtx and old maybe be NULL in that case). As it
> turns out it also works just fine, the problems I were seeing
> were due to another change in that tree. Sorry for the noise.
>
> My understanding is that for ULE, mtx should be assigned to
> old->td_lock atomically, is that correct?

More precisely what needs to happen is to use a memory barrier. If you
can emulate that without an atomic instruction (in sparc64) that's
fine as well.

>> Said that, all the code, including MD parts should always use
>> td_lock() and not doing explicit acquisitions/drops of sched_lock, if
>> they want to support ULE (but probabilly even if they do not want),
>> unless there is a big compelling reason (that I expect to be justified
>> in comments at least).
>
> I think the idea behind using sched_lock in the sparc64 code is
> to piggyback on it for protecting the pmap and take advantage of
> the fact that it's held across cpu_switch() anyway. If that's
> correct it should be possible to replace it with a separate
> spinlock dedicated to protecting the pmap or given that due to
> the macro madness involved in mtx_{,un}lock_spin() it's hard to
> properly call these from asm by calling spinlock_{enter,exit}()
> directly.

Even if that is the case there is no reason to not call
thread_lock()/thread_unlock() (which will acquire the correct
sched_lock or do the handover in the sched_4bsd in the right way) and
keep an unified spinlock in order to keep cpu_switch() simple and
still offering pmap protection over context switches.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein


More information about the svn-src-head mailing list