New cpu_switch() and cpu_throw().
Jeff Roberson
jroberson at chesapeake.net
Tue Jun 12 06:59:43 UTC 2007
I have decided to abandon the architecture specific cpu_throw() in favor
of an approach that is completely MI. The new approach will be to use an
atomic int in the proc to track the number of threads in cpu_throw(). The
waiting thread will just yield until that count hits 0, which will likely
at most require one context switch ever as there is no opportunity for a
thread to block after this count is bumped.
I'm going to commit this change fairly quickly as a couple of people have
reported hiting this exit race on 4-8 core machines. You can review the
diff here:
http://people.freebsd.org/~jeff/exitthread.diff
The cpu_throw() changes will still be required for architectures which
wish to continue to support ULE. This is advisable for any arch that
supports SMP.
Thanks,
Jeff
On Tue, 5 Jun 2007, Jeff Roberson wrote:
> On Tue, 5 Jun 2007, John Baldwin wrote:
>
>> On Tuesday 05 June 2007 01:32:46 am Jeff Roberson wrote:
>>> For every architecture we need to support a new features in cpu_switch()
>>> and cpu_throw() before they can support per-cpu schedlock. I'll describe
>>> those below. I'm soliciting help or advice in implementing these on
>>> platforms other than x86, and amd64, especially on ia64 where things are
>>> implemented in C!
>>>
>>> I checked in the new version of cpu_switch() for amd64 today after
>>> threadlock went in. Basically, we have to release a thread's lock when
>>> it's switched out and acquire a lock when it's switched in.
>>>
>>> The release must happen after we're totally done with the stack and
>>> vmspace of the thread to be switched out. On amd64 this meant after we
>>> clear the active bits for tlb shootdown. The release actually makes use
>>> of a new 'mtx' argument to cpu_switch() and sets the td_lock pointer to
>>> this argument rather than unlocking a real lock. td_lock has previously
>>> been set to the blocked lock, which is always blocked. Threads
>>> spinning in thread_lock() will notice the td_lock pointer change and
>>> acquire the new lock. So this is simple, just a non-atomic store with a
>>> pointer passed as an argument. On amd64:
>>>
>>> movq %rdx, TD_LOCK(%rdi) /* Release the old thread */
>>>
>>> The acquire part is slightly more complicated and involves a little loop.
>>> We don't actually have to spin trying to lock the thread. We just spin
>>> until it's no longer set to the blocked lock. The switching thread
>>> already owns the per-cpu scheduler lock for the current cpu. If we're
>>> switching into a thread that is set to the blocked_lock another cpu is
>>> about to set it to our current cpu's lock via the mtx argument mentioned
>>> above. On amd64 we have:
>>>
>>> /* Wait for the new thread to become unblocked */
>>> movq $blocked_lock, %rdx
>>> 1:
>>> movq TD_LOCK(%rsi),%rcx
>>> cmpq %rcx, %rdx
>>> je 1b
>>
>> If this is to handle a thread migrating from one CPU to the next (and
>> there's
>> no interlock to control migration, otherwise you wouldn't have to spin
>> here)
>> then you will need memory barriers on the first write (i.e. the first write
>> above should be an atomic_store_rel()) and the equivalent of an _acq
>> barrier
>> here.
>
> So, thanks for pointing this out. Attilio also mentions that on x86 and
> amd64 we need a pause in the wait loop. As we discussed, we can just use
> sfence rather than atomics on amd64, however, x86 will need atomics since you
> can't rely on the presence of *fence. Other architectures will have to
> ensure memory ordering as appropriate.
>
> Jeff
>
>>
>> --
>> John Baldwin
>>
> _______________________________________________
> freebsd-arch at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"
>
More information about the freebsd-arch
mailing list