pthread switch (was Odd KSE panic)

Mon Jul 5 07:23:20 PDT 2004

On Mon, 5 Jul 2004, Daniel Eischen wrote:

> On Mon, 5 Jul 2004, Julian Elischer wrote:
> > 
> > On Sun, 4 Jul 2004, Andrew Gallatin wrote:
> > 
> > > The problem turned out to be that the worker thread was sleeping in
> > > cv_wait_sig() on a cv which was used elsewhere in the driver.  When I
> > > fixed this, pretty much everything got better.  I still don't
> > > understand exactly what happened.  I have no idea if the worker woke
> > > too early, or if the other place this cv was (mis)used was where the
> > > early wake happened.  (this would be where mx_free() is called).
> > > 
> > > Anyway, it no longer crashes the machine.   Sorry to have wasted your
> > > time.  
> > > 
> > > Now to figure out why libthr does pthread_cond_signal() in my scenario
> > > 47us faster than libpthread... (ULE, 1 HTT P4 running SMP kernel)
> > > 
> > > Scenario is that the mainline thread sleeps waiting for a packet
> > > (using pthread_cond_timedwait()) and the worker thread is asleep in
> > > kernel.  When a packet arrives, the worker wakes up, returns from the
> > > ioctl, does a pthread_cond_signal() to wakeup the mainline thread, and
> > > goes back into the kernel to sleep (via an ioctl).  This is the sort
> > > of scenario where I thought KSE would be faster than a 1:1 lib..
> 
> ULE & HTT & KSE is probably the most inefficient combination...
> I find true SMP & 4BSD to be better for KSE (perhaps after
> julian finishes his cleanup, things will be better).
> 
> > Actually it is the sort of scenario where it is not clear what is best
> > ;-)
> > 
> > The mainline thread is asleep in userland. thus to become active, if
> > there is no running kernel thread to run it, one needs to be woken up
> > from the kernel and made to upcall to userspace to be assigned to run
> > the thread. This is pretty much what is happenning in M:M threads. (1:1 
> > is in my opinion a non threadded app :-) Except that in KSE there is
> > more overhead to do it.. One thing that could be tuned in KSE to make it
> > a lot faster is to reverse the order that the threads are run in the
> > following way:
> > 
> > ------ currently:
> > 
> > Kernel thread 1 returns from waiting for the packet, and as user thread
> > A, signals (enters) the UTS to make user thread B runnable. The UTS
> > realises that there is no kernel thread currently running to run thread
> > B, and enters the kernel to wake one up, and returns to userland, in
> > order to re-enter thread A and go back into teh kernel to wait for the
> > next packet.
> 
> Actually, the UTS doesn't really think of it in terms of creating
> new kernel threads.  It only sees idle and non-idle KSEs.
> 
> > Kernel thread 2 is started up some time after it is awoken and enters
> > userland via an upcall and is assigned thread B to run.
> > 
> > This is all assuming there are 2 (or more CPUS).
> > If there is only one CPU then when user thread A sleeps in the kernel,
> > the kernel thread (1) is allowed to upcall back to the UTS and take on 
> > user thread B.
> 
> This is true regardless of the number of CPUs, unless you have
> system scope threads.  Any blocking in the kernel will result
> in an upcall.
> 
> > ------ the changed version:
> > 
> > Kernel thread 1 returns from waiting for the packet, and as user thread
> > A, signals (enters) the UTS to make user thread B runnable. The UTS
> > realises that there is no kernel thread currently running to run thread
> > B it enters the kernel briefly to make another thread wake up and
> > upcall, and then suspends thread A, and takes on thread B.
> > Eventually th e upcall occurs and thread A is assigned to the newly
> > awoken thread and re-enters the kernel, waitign for the next kernel.
> > 
> > ------------
> > 
> > Dan, can the 2nd version happen now? 
> > can Drew make it happen by assigning a higher priority to thread B?
> 
> No, not yet; pthread_cond_signal() and pthread_mutex_unlock() are
> not currently preemption points.  I'm not sure that you would want
> pthread_cond_signal() to be a preemption point -- I'd imagine that
> a lot of applications would do something like this:
> 
> 	/* sleeping */
> 	pthread_mutex_lock(&m);
> 	foo->sleeping_count++;
> 	pthread_cond_wait(&cv, &m);
> 	pthread_mutex_unlock(&m);
> 
> 	/* waking up */
> 	pthread_mutex_lock(&m);
> 	if (foo->sleeping_count > 0) {
> 		foo->sleeping_count--;
> 		pthread_cond_signal(&cv);
> 	}
> 	pthread_mutex_unlock(&m);
> 
> 
> If you make pthread_cond_signal() a preemption point, then you
> would get some ping-pong-like effect.  The newly awoken thread
> would block again on the mutex.

That may still be faster than
going into the kernel

and anyway, it would only occur if the other process had a higher
priority. and it would DEFINITLY help on a uniprocessor.

> 
> With a small number of threads, it probably doesn't make sense
> to have more than one KSE (unless they are mostly CPU-bound).
> In Drew's example, there are 2 KSEs (HTT enabled) and only 2 threads.
> Each time a thread sleeps, the KSE enters the kernel to sleep
> (kse_release()) because there are no other threads to run.

with 1 KSE it is definitly worth adding pre-emption there..

> 
> Drew, can you try lowering the concurrency?  You can
> either try using pthread_setconcurrency(1) or setting
> kern.threads.virtual_cpu=1.
> 
> Julian, why does it take so long for the woken KSE to become
> active?

It will be the shorter of:

add up:
 time to for 'waker' to enter the UTS
 time for the UTS to decide it can awaken another KSE
 time for the UTS to call kse_wakeup() (from kerne entry to "wakeup()")
 time until next clock tick on the other processor.. (We STILL do not 
  send an IPI o the other processor yet) to make it schedule a new
  thread).

OR

add up:
 time to for 'waker' to enter the UTS
 time for the UTS to decide it can awaken another KSE
 time for the UTS to call kse_wakeup() (from kerne entry to "wakeup()")
 time for tkse_wakeup() to returnn to user land.
 time for UTS to return to original thread
 time for thread to so syscall (what, select() ?) and call msleep()
 time for scheduler to cause an upcall, 
 time for UTS to field upcall
 time for UTS to schedule the new awoken thread on kernel thread.

note, the requested upcall will still happen on the other CPU
but the UTS will find no work for it and send it back to sleep with
kse_release() again.

I suspect that as clock ticks occur at only 10mSEC period, that it will
nearly always be the 2nd option. In other words the system is acting as
it would with a uniprocessor.

compare this with:
add up:

 time to for 'waker' to enter the UTS
 time for the UTS to decide it can awaken another KSE
 time for the UTS to call kse_wakeup() (from kerne entry to "wakeup()")
 time for UTS to return to original thread
 time for UTS to switch to other thread.
(*) this is what is being measured here..

however the 'select' thread wil be restarted when the other thread
completes and re-does the wait OR a clock tick occurs.

julian