pthread switch (was Odd KSE panic)

Mon Jul 5 15:19:50 PDT 2004

On Mon, 5 Jul 2004, Julian Elischer wrote:
> 
> On Mon, 5 Jul 2004, Daniel Eischen wrote:
> 
> > On Mon, 5 Jul 2004, Julian Elischer wrote:
> > > 
> > > That may still be faster than
> > > going into the kernel
> > > 
> > > and anyway, it would only occur if the other process had a higher
> > > priority. and it would DEFINITLY help on a uniprocessor.
> > 
> > I don't see how it would help on a uniprocessor.  On a
> > uniprocessor, as soon as one thread blocks, the other
> > thread will be run.  The effect is the same except it
> > lets the original thread run until it blocks.
> 
> The problem is that the other thread is NOT blocking for another 40uSecs
> or so.

But libpthread doesn't care which thread runs, as long as _some_
thread runs.  You're making assumptions about the application
where the library doesn't want to be making any assumptions.

> >   thread A is blocking on ioctl().
> >   thread B is waiting in pthread_cond_wait().
> > 
> >   ... time goes by, some event happens and releases thread A...
> > 
> 
> an upcall goes to the UTS
> >   UTS resumes thread A
> >   thread A wakes up thread B via pthread_cond_signal()
> >     [ thread B is marked runnable and added to UTS run queue ]
> >   thread A continues until it blocks
> >     [ if A blocks in kernel, upcall is made, else A is blocked
> >       in userland and there is no userland <-> kernel boundary
> >       crossing. ]
> >   UTS runs thread B
> >     [ B is resumed in userland since pthread_cond_wait() blocks
> >       in userland. ]
> >   thread B runs until it blocks
> > 
> > Compare this with:
> > 
> >   thread A is blocking on ioctl().
> >   thread B is waiting in pthread_cond_wait().
> >   
> >   ... time goes by, some event happens and releases thread A...
> >   
> 
> an upcall goes to the UTS
> >   UTS resumes thread A
> >   thread A wakes up thread B via pthread_cond_signal()
> >     [ thread B is marked runnable and added to UTS run queue ]
> >   UTS switches to thread B
> >     [ thread A is swapped out in userland, thread B is resumed
> >       in userland ]
> >   thread B runs until it blocks
> >     [ if B blocks in kernel, upcall is made, else B is blocked
> >       in userland and there is no userland <-> kernel boundary
> >       crossing. ]
> >   UTS switches to thread A
> >     [ thread A is resumed in userland ]
> >   thread A continues until it blocks
> > 
> > In fact, there is one extra switch in the latter.
> 
> but that is not the point.. the ioctl is not latency critical.. 
> the worker thread (B) is..
> we want B to run as soon as possible..

How does the library know that?  As I said we can insert
preemption points in pthread_cond_signal(), pthread_mutex_unlock(),
and perhaps a couple of others, but it would only be for higher
priority threads.  And I'm not convinced that we even need to
do that.

> in scenario A teh following occurs before B runs..
> an upcall goes to the UTS
> >   UTS resumes thread A
> >   thread A wakes up thread B via pthread_cond_signal()
> >     [ thread B is marked runnable and added to UTS run queue ]
> >   thread A continues until it blocks
> >     [ if A blocks in kernel, upcall is made, else A is blocked
> >       in userland and there is no userland <-> kernel boundary
> >       crossing. ]
> >   UTS runs thread B
> >     [ B is resumed in userland since pthread_cond_wait() blocks
> >       in userland. ]
> 
> 
> where in scenario B teh following occurs before B runs:
> an upcall goes to the UTS
> >   UTS resumes thread A
> >   thread A wakes up thread B via pthread_cond_signal()
> >     [ thread B is marked runnable and added to UTS run queue ]
> >   UTS switches to thread B
> >     [ thread A is swapped out in userland, thread B is resumed
> >       in userland ]
> 
> 
> 
> which gets B run quicker?
> 
> A is rerun later, but not neccesarily much later and if we fix the
> scheduling latency for switching CPUs it will occur quite soon..
> 
> 
> > 
> > We could easily add preemption points for higher priority
> > threads.  That would allow the application to decide which
> > thread is the most important.
> 
> exactly.. it's the application's decision.
> 
> > 
> > The other problem is, for SMP, you want to avoid migrating
> > threads between KSEs (processors).  The UTS doesn't currently
> > do this.  There is only one scheduling queue per KSE group
> > and all KSEs (allocated for process-scope threads) use the
> > same scheduling queue hung off the group.
> 
> understood.
> 
> > 
> > > > With a small number of threads, it probably doesn't make sense
> > > > to have more than one KSE (unless they are mostly CPU-bound).
> > > > In Drew's example, there are 2 KSEs (HTT enabled) and only 2 threads.
> > > > Each time a thread sleeps, the KSE enters the kernel to sleep
> > > > (kse_release()) because there are no other threads to run.
> > > 
> > > with 1 KSE it is definitly worth adding pre-emption there..
> > 
> > See above and tell me if you still think so.
> 
> definitly. ESPECIALLY if it is not taken when the priorities are the
> same. i.e the status quo.
>
> > > compare this with:
> > > add up:
> > > 
> > >  time to for 'waker' to enter the UTS
> > >  time for the UTS to decide it can awaken another KSE
> > >  time for the UTS to call kse_wakeup() (from kerne entry to "wakeup()")
> > >  time for UTS to return to original thread
> > >  time for UTS to switch to other thread.
> > > (*) this is what is being measured here..
> > > 
> > > however the 'select' thread wil be restarted when the other thread
> > > completes and re-does the wait OR a clock tick occurs.
> > 
> > Well, since the quoted time was 47usec, the KSE must be
> > getting scheduled before the next clock tick.
> 
> The KSE is not rescheduled, but, rather, continues on with the original
> thread back into select() again, at which point it converts into an 
> upcall ang goes back to userland where the newly awoken thread is
> assigned to it and run.  taking 47 uSecs.
> 
> If the userland scheduler just switched to the other thread immediatly,
> the same thing would happen, except it would be the select() that took
> place 47 usecs later and not the worker thread. 

I don't think you can assume the worker thread is being run
on the same KSE.  HTT is enabled and there are two KSEs.

Look at it this way.  The thread waiting in ioctl() (call it
the I/O thread) is going continuously:

  while (1) {
    /* Wait for some event. */
    ioctl(...);

    /* Signal the worker thread. */
    pthread_cond_signal(&cv);
  }

and the worker thread is going continuously:

  while (1) {
    /* Wait for events. */
    pthread_mutex_lock(&m);
    pthread_cond_wait(&cv, &m);
    pthread_mutex_unlock(&m);
    /* Process event. */
    ...
  }

It doesn't matter which thread gets run after the I/O thread
calls pthread_cond_signal().  Either way, the I/O thread has
to reenter the kernel and wait for the next event.  And that
time adds to the latency of the worker thread the _next_ time.

-- 
Dan Eischen