pthread switch (was Odd KSE panic)

Mon Jul 5 14:44:26 PDT 2004

On Mon, 5 Jul 2004, Daniel Eischen wrote:

> On Mon, 5 Jul 2004, Julian Elischer wrote:
> 
> > 
> > 
> > On Mon, 5 Jul 2004, Daniel Eischen wrote:
> > 
> > > On Mon, 5 Jul 2004, Julian Elischer wrote:
> > > 
> > > > ------ the changed version:
> > > > 
> > > > Kernel thread 1 returns from waiting for the packet, and as user thread
> > > > A, signals (enters) the UTS to make user thread B runnable. The UTS
> > > > realises that there is no kernel thread currently running to run thread
> > > > B it enters the kernel briefly to make another thread wake up and
> > > > upcall, and then suspends thread A, and takes on thread B.
> > > > Eventually th e upcall occurs and thread A is assigned to the newly
> > > > awoken thread and re-enters the kernel, waitign for the next kernel.
> > > > 
> > > > ------------
> > > > 
> > > > Dan, can the 2nd version happen now? 
> > > > can Drew make it happen by assigning a higher priority to thread B?
> > > 
> > > No, not yet; pthread_cond_signal() and pthread_mutex_unlock() are
> > > not currently preemption points.  I'm not sure that you would want
> > > pthread_cond_signal() to be a preemption point -- I'd imagine that
> > > a lot of applications would do something like this:
> > > 
> > > 	/* sleeping */
> > > 	pthread_mutex_lock(&m);
> > > 	foo->sleeping_count++;
> > > 	pthread_cond_wait(&cv, &m);
> > > 	pthread_mutex_unlock(&m);
> > > 
> > > 	/* waking up */
> > > 	pthread_mutex_lock(&m);
> > > 	if (foo->sleeping_count > 0) {
> > > 		foo->sleeping_count--;
> > > 		pthread_cond_signal(&cv);
> > > 	}
> > > 	pthread_mutex_unlock(&m);
> > > 
> > > 
> > > If you make pthread_cond_signal() a preemption point, then you
> > > would get some ping-pong-like effect.  The newly awoken thread
> > > would block again on the mutex.
> > 
> > That may still be faster than
> > going into the kernel
> > 
> > and anyway, it would only occur if the other process had a higher
> > priority. and it would DEFINITLY help on a uniprocessor.
> 
> I don't see how it would help on a uniprocessor.  On a
> uniprocessor, as soon as one thread blocks, the other
> thread will be run.  The effect is the same except it
> lets the original thread run until it blocks.

The problem is that the other thread is NOT blocking for another 40uSecs
or so.

> 
>   thread A is blocking on ioctl().
>   thread B is waiting in pthread_cond_wait().
> 
>   ... time goes by, some event happens and releases thread A...
> 

an upcall goes to the UTS
>   UTS resumes thread A
>   thread A wakes up thread B via pthread_cond_signal()
>     [ thread B is marked runnable and added to UTS run queue ]
>   thread A continues until it blocks
>     [ if A blocks in kernel, upcall is made, else A is blocked
>       in userland and there is no userland <-> kernel boundary
>       crossing. ]
>   UTS runs thread B
>     [ B is resumed in userland since pthread_cond_wait() blocks
>       in userland. ]
>   thread B runs until it blocks
> 
> Compare this with:
> 
>   thread A is blocking on ioctl().
>   thread B is waiting in pthread_cond_wait().
>   
>   ... time goes by, some event happens and releases thread A...
>   

an upcall goes to the UTS
>   UTS resumes thread A
>   thread A wakes up thread B via pthread_cond_signal()
>     [ thread B is marked runnable and added to UTS run queue ]
>   UTS switches to thread B
>     [ thread A is swapped out in userland, thread B is resumed
>       in userland ]
>   thread B runs until it blocks
>     [ if B blocks in kernel, upcall is made, else B is blocked
>       in userland and there is no userland <-> kernel boundary
>       crossing. ]
>   UTS switches to thread A
>     [ thread A is resumed in userland ]
>   thread A continues until it blocks
> 
> In fact, there is one extra switch in the latter.

but that is not the point.. the ioctl is not latency critical.. 
the worker thread (B) is..
we want B to run as soon as possible..

in scenario A teh following occurs before B runs..
an upcall goes to the UTS
>   UTS resumes thread A
>   thread A wakes up thread B via pthread_cond_signal()
>     [ thread B is marked runnable and added to UTS run queue ]
>   thread A continues until it blocks
>     [ if A blocks in kernel, upcall is made, else A is blocked
>       in userland and there is no userland <-> kernel boundary
>       crossing. ]
>   UTS runs thread B
>     [ B is resumed in userland since pthread_cond_wait() blocks
>       in userland. ]

where in scenario B teh following occurs before B runs:
an upcall goes to the UTS
>   UTS resumes thread A
>   thread A wakes up thread B via pthread_cond_signal()
>     [ thread B is marked runnable and added to UTS run queue ]
>   UTS switches to thread B
>     [ thread A is swapped out in userland, thread B is resumed
>       in userland ]

which gets B run quicker?

A is rerun later, but not neccesarily much later and if we fix the
scheduling latency for switching CPUs it will occur quite soon..

> 
> We could easily add preemption points for higher priority
> threads.  That would allow the application to decide which
> thread is the most important.

exactly.. it's the application's decision.

> 
> The other problem is, for SMP, you want to avoid migrating
> threads between KSEs (processors).  The UTS doesn't currently
> do this.  There is only one scheduling queue per KSE group
> and all KSEs (allocated for process-scope threads) use the
> same scheduling queue hung off the group.

understood.

> 
> > > With a small number of threads, it probably doesn't make sense
> > > to have more than one KSE (unless they are mostly CPU-bound).
> > > In Drew's example, there are 2 KSEs (HTT enabled) and only 2 threads.
> > > Each time a thread sleeps, the KSE enters the kernel to sleep
> > > (kse_release()) because there are no other threads to run.
> > 
> > with 1 KSE it is definitly worth adding pre-emption there..
> 
> See above and tell me if you still think so.

definitly. ESPECIALLY if it is not taken when the priorities are the
same. i.e the status quo.

> 
> > > Julian, why does it take so long for the woken KSE to become
> > > active?
> > 
> > It will be the shorter of:
> > 
> > add up:
> >  time to for 'waker' to enter the UTS
> >  time for the UTS to decide it can awaken another KSE
> >  time for the UTS to call kse_wakeup() (from kerne entry to "wakeup()")
> >  time until next clock tick on the other processor.. (We STILL do not 
> >   send an IPI o the other processor yet) to make it schedule a new
> >   thread).
> > 
> > OR
> > 
> > add up:
> >  time to for 'waker' to enter the UTS
> >  time for the UTS to decide it can awaken another KSE
> >  time for the UTS to call kse_wakeup() (from kerne entry to "wakeup()")
> >  time for tkse_wakeup() to returnn to user land.
> >  time for UTS to return to original thread
> >  time for thread to so syscall (what, select() ?) and call msleep()
> >  time for scheduler to cause an upcall, 
> >  time for UTS to field upcall
> >  time for UTS to schedule the new awoken thread on kernel thread.
> 
> Note that the time for the UTS to decide it can wake another
> KSE is negligible.  At this point the UTS already owns the
> scheduler lock (no extra locks are needed), and it is merely
> looping through a linked list of KSEs hung off the group.
> For SMP there are only 2 KSEs, so it is just a few instructions.

understood, but the syscall to ask for B to be run on another KSE, 
(if it doesn't get a chance to run on this one first) doesn't 
guarantee that the upcall it generates will occur any time soon.

> 
> > note, the requested upcall will still happen on the other CPU
> > but the UTS will find no work for it and send it back to sleep with
> > kse_release() again.
> > 
> > 
> > 
> > I suspect that as clock ticks occur at only 10mSEC period, that it will
> > nearly always be the 2nd option. In other words the system is acting as
> > it would with a uniprocessor.
> > 
> > compare this with:
> > add up:
> > 
> >  time to for 'waker' to enter the UTS
> >  time for the UTS to decide it can awaken another KSE
> >  time for the UTS to call kse_wakeup() (from kerne entry to "wakeup()")
> >  time for UTS to return to original thread
> >  time for UTS to switch to other thread.
> > (*) this is what is being measured here..
> > 
> > however the 'select' thread wil be restarted when the other thread
> > completes and re-does the wait OR a clock tick occurs.
> 
> Well, since the quoted time was 47usec, the KSE must be
> getting scheduled before the next clock tick.

The KSE is not rescheduled, but, rather, continues on with the original
thread back into select() again, at which point it converts into an 
upcall ang goes back to userland where the newly awoken thread is
assigned to it and run.  taking 47 uSecs.

If the userland scheduler just switched to the other thread immediatly,
the same thing would happen, except it would be the select() that took
place 47 usecs later and not the worker thread. 

> 
> -- 
> Dan Eischen
> 
>