pthread switch (was Odd KSE panic)

Mon Jul 5 12:03:31 PDT 2004

On Mon, 5 Jul 2004, Julian Elischer wrote:

> 
> 
> On Mon, 5 Jul 2004, Daniel Eischen wrote:
> 
> > On Mon, 5 Jul 2004, Julian Elischer wrote:
> > 
> > > ------ the changed version:
> > > 
> > > Kernel thread 1 returns from waiting for the packet, and as user thread
> > > A, signals (enters) the UTS to make user thread B runnable. The UTS
> > > realises that there is no kernel thread currently running to run thread
> > > B it enters the kernel briefly to make another thread wake up and
> > > upcall, and then suspends thread A, and takes on thread B.
> > > Eventually th e upcall occurs and thread A is assigned to the newly
> > > awoken thread and re-enters the kernel, waitign for the next kernel.
> > > 
> > > ------------
> > > 
> > > Dan, can the 2nd version happen now? 
> > > can Drew make it happen by assigning a higher priority to thread B?
> > 
> > No, not yet; pthread_cond_signal() and pthread_mutex_unlock() are
> > not currently preemption points.  I'm not sure that you would want
> > pthread_cond_signal() to be a preemption point -- I'd imagine that
> > a lot of applications would do something like this:
> > 
> > 	/* sleeping */
> > 	pthread_mutex_lock(&m);
> > 	foo->sleeping_count++;
> > 	pthread_cond_wait(&cv, &m);
> > 	pthread_mutex_unlock(&m);
> > 
> > 	/* waking up */
> > 	pthread_mutex_lock(&m);
> > 	if (foo->sleeping_count > 0) {
> > 		foo->sleeping_count--;
> > 		pthread_cond_signal(&cv);
> > 	}
> > 	pthread_mutex_unlock(&m);
> > 
> > 
> > If you make pthread_cond_signal() a preemption point, then you
> > would get some ping-pong-like effect.  The newly awoken thread
> > would block again on the mutex.
> 
> That may still be faster than
> going into the kernel
> 
> and anyway, it would only occur if the other process had a higher
> priority. and it would DEFINITLY help on a uniprocessor.

I don't see how it would help on a uniprocessor.  On a
uniprocessor, as soon as one thread blocks, the other
thread will be run.  The effect is the same except it
lets the original thread run until it blocks.

  thread A is blocking on ioctl().
  thread B is waiting in pthread_cond_wait().

  ... time goes by, some event happens and releases thread A...

  UTS resumes thread A
  thread A wakes up thread B via pthread_cond_signal()
    [ thread B is marked runnable and added to UTS run queue ]
  thread A continues until it blocks
    [ if A blocks in kernel, upcall is made, else A is blocked
      in userland and there is no userland <-> kernel boundary
      crossing. ]
  UTS runs thread B
    [ B is resumed in userland since pthread_cond_wait() blocks
      in userland. ]
  thread B runs until it blocks

Compare this with:

  thread A is blocking on ioctl().
  thread B is waiting in pthread_cond_wait().

  ... time goes by, some event happens and releases thread A...

  UTS resumes thread A
  thread A wakes up thread B via pthread_cond_signal()
    [ thread B is marked runnable and added to UTS run queue ]
  UTS switches to thread B
    [ thread A is swapped out in userland, thread B is resumed
      in userland ]
  thread B runs until it blocks
    [ if B blocks in kernel, upcall is made, else B is blocked
      in userland and there is no userland <-> kernel boundary
      crossing. ]
  UTS switches to thread A
    [ thread A is resumed in userland ]
  thread A continues until it blocks

In fact, there is one extra switch in the latter.

We could easily add preemption points for higher priority
threads.  That would allow the application to decide which
thread is the most important.

The other problem is, for SMP, you want to avoid migrating
threads between KSEs (processors).  The UTS doesn't currently
do this.  There is only one scheduling queue per KSE group
and all KSEs (allocated for process-scope threads) use the
same scheduling queue hung off the group.

> > With a small number of threads, it probably doesn't make sense
> > to have more than one KSE (unless they are mostly CPU-bound).
> > In Drew's example, there are 2 KSEs (HTT enabled) and only 2 threads.
> > Each time a thread sleeps, the KSE enters the kernel to sleep
> > (kse_release()) because there are no other threads to run.
> 
> with 1 KSE it is definitly worth adding pre-emption there..

See above and tell me if you still think so.

> > Julian, why does it take so long for the woken KSE to become
> > active?
> 
> It will be the shorter of:
> 
> add up:
>  time to for 'waker' to enter the UTS
>  time for the UTS to decide it can awaken another KSE
>  time for the UTS to call kse_wakeup() (from kerne entry to "wakeup()")
>  time until next clock tick on the other processor.. (We STILL do not 
>   send an IPI o the other processor yet) to make it schedule a new
>   thread).
> 
> OR
> 
> add up:
>  time to for 'waker' to enter the UTS
>  time for the UTS to decide it can awaken another KSE
>  time for the UTS to call kse_wakeup() (from kerne entry to "wakeup()")
>  time for tkse_wakeup() to returnn to user land.
>  time for UTS to return to original thread
>  time for thread to so syscall (what, select() ?) and call msleep()
>  time for scheduler to cause an upcall, 
>  time for UTS to field upcall
>  time for UTS to schedule the new awoken thread on kernel thread.

Note that the time for the UTS to decide it can wake another
KSE is negligible.  At this point the UTS already owns the
scheduler lock (no extra locks are needed), and it is merely
looping through a linked list of KSEs hung off the group.
For SMP there are only 2 KSEs, so it is just a few instructions.

> note, the requested upcall will still happen on the other CPU
> but the UTS will find no work for it and send it back to sleep with
> kse_release() again.
> 
> 
> 
> I suspect that as clock ticks occur at only 10mSEC period, that it will
> nearly always be the 2nd option. In other words the system is acting as
> it would with a uniprocessor.
> 
> compare this with:
> add up:
> 
>  time to for 'waker' to enter the UTS
>  time for the UTS to decide it can awaken another KSE
>  time for the UTS to call kse_wakeup() (from kerne entry to "wakeup()")
>  time for UTS to return to original thread
>  time for UTS to switch to other thread.
> (*) this is what is being measured here..
> 
> however the 'select' thread wil be restarted when the other thread
> completes and re-does the wait OR a clock tick occurs.

Well, since the quoted time was 47usec, the KSE must be
getting scheduled before the next clock tick.

-- 
Dan Eischen