New cpu_switch() and cpu_throw().

Tue Jun 12 06:59:43 UTC 2007

I have decided to abandon the architecture specific cpu_throw() in favor 
of an approach that is completely MI.  The new approach will be to use an 
atomic int in the proc to track the number of threads in cpu_throw().  The 
waiting thread will just yield until that count hits 0, which will likely 
at most require one context switch ever as there is no opportunity for a 
thread to block after this count is bumped.

I'm going to commit this change fairly quickly as a couple of people have 
reported hiting this exit race on 4-8 core machines.  You can review the 
diff here:

http://people.freebsd.org/~jeff/exitthread.diff

The cpu_throw() changes will still be required for architectures which 
wish to continue to support ULE.  This is advisable for any arch that 
supports SMP.

Thanks,
Jeff

On Tue, 5 Jun 2007, Jeff Roberson wrote:

> On Tue, 5 Jun 2007, John Baldwin wrote:
>
>> On Tuesday 05 June 2007 01:32:46 am Jeff Roberson wrote:
>>> For every architecture we need to support a new features in cpu_switch()
>>> and cpu_throw() before they can support per-cpu schedlock.  I'll describe
>>> those below.  I'm soliciting help or advice in implementing these on
>>> platforms other than x86, and amd64, especially on ia64 where things are
>>> implemented in C!
>>> 
>>> I checked in the new version of cpu_switch() for amd64 today after
>>> threadlock went in.  Basically, we have to release a thread's lock when
>>> it's switched out and acquire a lock when it's switched in.
>>> 
>>> The release must happen after we're totally done with the stack and
>>> vmspace of the thread to be switched out.  On amd64 this meant after we
>>> clear the active bits for tlb shootdown.  The release actually makes use
>>> of a new 'mtx' argument to cpu_switch() and sets the td_lock pointer to
>>> this argument rather than unlocking a real lock.  td_lock has previously
>>> been set to the blocked lock, which is always blocked.  Threads
>>> spinning in thread_lock() will notice the td_lock pointer change and
>>> acquire the new lock.  So this is simple, just a non-atomic store with a
>>> pointer passed as an argument.  On amd64:
>>>
>>>  	movq	%rdx, TD_LOCK(%rdi)		/* Release the old thread */
>>> 
>>> The acquire part is slightly more complicated and involves a little loop.
>>> We don't actually have to spin trying to lock the thread.  We just spin
>>> until it's no longer set to the blocked lock.  The switching thread
>>> already owns the per-cpu scheduler lock for the current cpu.  If we're
>>> switching into a thread that is set to the blocked_lock another cpu is
>>> about to set it to our current cpu's lock via the mtx argument mentioned
>>> above.  On amd64 we have:
>>>
>>>  	/* Wait for the new thread to become unblocked */
>>>  	movq	$blocked_lock, %rdx
>>> 1:
>>>  	movq	TD_LOCK(%rsi),%rcx
>>>  	cmpq	%rcx, %rdx
>>>  	je	1b
>> 
>> If this is to handle a thread migrating from one CPU to the next (and 
>> there's
>> no interlock to control migration, otherwise you wouldn't have to spin 
>> here)
>> then you will need memory barriers on the first write (i.e. the first write
>> above should be an atomic_store_rel()) and the equivalent of an _acq 
>> barrier
>> here.
>
> So, thanks for pointing this out.  Attilio also mentions that on x86 and 
> amd64 we need a pause in the wait loop.  As we discussed, we can just use 
> sfence rather than atomics on amd64, however, x86 will need atomics since you 
> can't rely on the presence of *fence.  Other architectures will have to 
> ensure memory ordering as appropriate.
>
> Jeff
>
>> 
>> -- 
>> John Baldwin
>> 
> _______________________________________________
> freebsd-arch at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"
>