KSE/ia64 broken

Daniel Eischen eischen at vigrid.com
Sun Nov 16 09:18:35 PST 2003


On Sat, 15 Nov 2003, Marcel Moolenaar wrote:

> On Sat, Nov 15, 2003 at 12:36:42PM -0500, Daniel Eischen wrote:
> > On Fri, 14 Nov 2003, Marcel Moolenaar wrote:
> > 
> > > Gang,
> > > 
> > > The following change broke KSE on ia64:
> > > 
> > > --------
> > > revision 1.18
> > > date: 2003/11/08 06:07:04;  author: davidxu;  state: Exp;  lines: +16 -17
> > > Use THR lock instead of KSE lock to avoid scheduler be blocked in spinlock.
> > >  
> > > Reviewed by: deischen
> > > --------
> > > 
> > > We seem to be clobbering the thread structure instead of writing
> > > to the mailbox. This happens at initialization. Can it be that
> > > the change assumes PER_KSE and doesxn't work for PER_THREAD?
> > 
> > I _think_ this may be because rltd-elf (at least for ia64) calls
> > malloc with the rtld lock held.  I'm not sure how to test this
> > theory.
> 
> No worries, I have a way to disproof it :-)
> 
> Staticly linked binaries are as much broken as dynamicly linked
> binaries. So, if we have a rtld problem, it's not the only one:

Are you sure there's not an ia64 kernel bug or ia64 context
restoring bug?  If I enable debug messages in thread/thr_kern.c
(uncomment #define DBG_MSG), I get:

  Found completed thread 6000000000014000, name initial thread
  Continuing thread 6000000000014000 in critical region
  Switching out thread 6000000000014000, state 0
  Found completed thread 6000000000014000, name initial thread
  Switching out thread 6000000000014000, state 0
  Threads in waiting queue:
  Found completed thread 6000000000014000, name initial thread
  Switching out thread 6000000000014000, state 0
  Threads in waiting queue:
    ...

repeatedly.

The first two lines tell us that the thread blocked while in a
critical region and the kernel thinks it is now unblocked.
The critical region may be the malloc spinlock being held
and the reason it blocked perhaps due to a page fault.  Is
it possible that the blocked context is incorrectly marked,
or that it is just not being resumed properly?

-- 
Dan Eischen



More information about the freebsd-threads mailing list