KSE/ia64 broken

Sat Nov 15 11:30:42 PST 2003

On Sat, Nov 15, 2003 at 12:36:42PM -0500, Daniel Eischen wrote:
> On Fri, 14 Nov 2003, Marcel Moolenaar wrote:
> 
> > Gang,
> > 
> > The following change broke KSE on ia64:
> > 
> > --------
> > revision 1.18
> > date: 2003/11/08 06:07:04;  author: davidxu;  state: Exp;  lines: +16 -17
> > Use THR lock instead of KSE lock to avoid scheduler be blocked in spinlock.
> >  
> > Reviewed by: deischen
> > --------
> > 
> > We seem to be clobbering the thread structure instead of writing
> > to the mailbox. This happens at initialization. Can it be that
> > the change assumes PER_KSE and doesxn't work for PER_THREAD?
> 
> I _think_ this may be because rltd-elf (at least for ia64) calls
> malloc with the rtld lock held.  I'm not sure how to test this
> theory.

No worries, I have a way to disproof it :-)

Staticly linked binaries are as much broken as dynamicly linked
binaries. So, if we have a rtld problem, it's not the only one:

The following triggers the bug (from the configure script of Python):

\begin{C}
#include <pthread.h>
void* routine(void* p){return NULL;}
int main(){
  pthread_t p;
  if(pthread_create(&p,NULL,routine,NULL)!=0)
    return 1;
  (void)pthread_detach(p);
  return 0;
}
\end{C}

(gdb) run
Starting program: /nfs/home/marcel/KSE/foo

Program received signal SIGILL, Illegal instruction.
_pthread_create (thread=0x200000000009b8d8, attr=0x9fffffffffffeb00,
    start_routine=@0x2000000000082520: 0x2000000000034cf0 <sig_daemon>,
    arg=0x0) at /nfs/freebsd/5.x/src/lib/libpthread/thread/thr_create.c:202
202                             new_thread->tcb->tcb_tmbx.tm_context.uc_stack.ss_size =
(gdb) bt
#0  _pthread_create (thread=0x200000000009b8d8, attr=0x9fffffffffffeb00,
    start_routine=@0x2000000000082520: 0x2000000000034cf0 <sig_daemon>,
    arg=0x0) at /nfs/freebsd/5.x/src/lib/libpthread/thread/thr_create.c:202
#1  0x2000000000035430 in _thr_start_sig_daemon ()
    at /nfs/freebsd/5.x/src/lib/libpthread/thread/thr_sig.c:236
#2  0x2000000000004e80 in _kse_setthreaded (threaded=1)
    at /nfs/freebsd/5.x/src/lib/libpthread/thread/thr_kern.c:449
#3  0x2000000000001490 in _pthread_create (thread=0x9fffffffffffebc0,
    attr=0x0, start_routine=@0x2000000000082090: 0x2000000000000380 <routine>,
    arg=0x0) at /nfs/freebsd/5.x/src/lib/libpthread/thread/thr_create.c:111
#4  0x20000000000003e0 in main () at foo.c:7
(gdb) p *new_thread
$1 = {tcb = 0xfffffffffffefeff, magic = 4294967295, name = 0x0, uniqueid = 0,
  tle = {tqe_next = 0x9fffffffffffeaa0, tqe_prev = 0x0}, kle = {
    tqe_next = 0x2000000000001e00, tqe_prev = 0x4041}, gcle = {
    tqe_next = 0xc00000000000048f, tqe_prev = 0x8000000000000130}, hle = {
    le_next = 0x0, le_prev = 0x0}, lock = {l_head = 0x20000000000b1ad0,
    l_tail = 0xf, l_type = 2322596671, l_wait = 0,
    l_wakeup = @0x2000000000081d40: 0x20000000000055e0 <_thr_lock_wakeup>},
		:
	(partially shown)
(gdb) p *crit
$2 = {tm_context = {uc_sigmask = {__bits = {0, 0, 0, 0}}, uc_mcontext = {
      mc_flags = 3, _reserved_ = 0, mc_special = {sp = 11529215046068464288,
        unat = 0, rp = 2305843009213701744, pr = 32833,
        pfs = 13835058055282164879, bspstore = 9223372036854776160, rnat = 0,
        __spare = 0, tp = 2305843009214421712, rsc = 15,
        fpsr = 2674341018862399, psr = 19872948576272,
        gp = 2305843009214327112, ndirty = 0, cfm = 9223372036854776975,
        iip = 2305843009213701936, ifa = 18446744073709488367,
        isr = 549755813936}, mc_preserved = {unat = 0, gr4 = 2131440464,
        gr5 = 2147314880, gr6 = 1423, gr7 = 2135720224, br1 = 2145524112,
        br2 = 9223372041147674448, br3 = 9223372041147674368,
        br4 = 9223372041147674464, br5 = 2146730000, lc = 0, __spare = 0},
		: 
	(partially shown)

If you interpret the data then new_thread is clobbered by a ucontext:

(gdb) p /x *(ucontext_t*)new_thread
$6 = {uc_sigmask = {__bits = {0xfffefeff, 0xffffffff, 0xffffffff,
      0xffffffff}}, uc_mcontext = {mc_flags = 0x0, _reserved_ = 0x0,
    mc_special = {sp = 0x9fffffffffffeaa0, unat = 0x0,
      rp = 0x2000000000001e00, pr = 0x4041, pfs = 0xc00000000000048f,
      bspstore = 0x8000000000000130, rnat = 0x0, __spare = 0x0,
      tp = 0x20000000000b1ad0, rsc = 0xf, fpsr = 0x9804c8a70033f, psr = 0x0,
      gp = 0x2000000000081d40, ndirty = 0x20000000000a8460, cfm = 0x0,
      iip = 0x0, ifa = 0x20000000000bc000, isr = 0x20000000000bc000},
    mc_preserved = {unat = 0x0, gr4 = 0x7f0b3350, gr5 = 0x7ffd6cc0,
      gr6 = 0x58f, gr7 = 0x7f4c8120, br1 = 0x7fe21990,
      br2 = 0x80000000ffe06f50, br3 = 0x80000000ffe06f00,
      br4 = 0x80000000ffe06f60, br5 = 0x7ff48010, lc = 0x0, __spare = 0x0},
    mc_preserved_fp = {fr2 = {fpr_bits = {0x0 <repeats 16 times>},
		: 
	(partially shown)

mc_flags = 0, which means that the context is synchronous, and with
ifa and isr set to new_thread, it looks like there may have been
an async context before that (those are the registers we use to tell
the kernel which value (=isr) to write at what location (=ifa) when
we restore an async context. The sync has a valid thread pointer in
it (tp=0x20000000000b1ad0):

(gdb) p /x *(struct tcb*)(((ucontext_t*)new_thread)->uc_mcontext.mc_special.tp - (sizeof(struct tcb) - sizeof(struct ia64_tp)))
$13 = {tcb_tmbx = {tm_context = {uc_sigmask = {__bits = {0x0, 0x0, 0x0, 0x0}},
      uc_mcontext = {mc_flags = 0x3, _reserved_ = 0x0, mc_special = {
          sp = 0x9fffffffffffeaa0, unat = 0x0, rp = 0x2000000000001e70,
          pr = 0x8041, pfs = 0xc00000000000048f,
          bspstore = 0x8000000000000160, rnat = 0x0, __spare = 0x0,
          tp = 0x20000000000b1ad0, rsc = 0xf, fpsr = 0x9804c8a70033f,
          psr = 0x1213080a6010, gp = 0x200000000009a948, ndirty = 0x0,
          cfm = 0x800000000000048f, iip = 0x2000000000001f30,
          ifa = 0xffffffffffff08ef, isr = 0x8000000030}, mc_preserved = {
          unat = 0x0, gr4 = 0x7f0b3350, gr5 = 0x7ffd6cc0, gr6 = 0x58f,
          gr7 = 0x7f4c8120, br1 = 0x7fe21990, br2 = 0x80000000ffe06f50,
          br3 = 0x80000000ffe06f00, br4 = 0x80000000ffe06f60,
          br5 = 0x7ff48010, lc = 0x0, __spare = 0x0}, mc_preserved_fp = {
          fr2 = {fpr_bits = {0x0 <repeats 16 times>},
            fpr_flt = 0x00000000000000000000000000000000}, fr3 = {fpr_bits = {
              0x0 <repeats 16 times>},
		:
	(partially shown) 

The mailbox has an async context in it caused by what looks like a
page fault (faulting address = 0xffffffffffff08ef). The instruction
that caused the page fault is at 0x2000000000001f30. Note that
mc_flags = 3, which tells us it is indeed an async context and that
the high FP registers have been saved in the context.

(gdb) l *0x2000000000001f30
0x2000000000001f30 is in _pthread_create (/nfs/freebsd/5.x/src/lib/libpthread/thread/thr_create.c:202).
197                             _kse_critical_leave(crit);
198
199                             new_thread->tcb->tcb_tmbx.tm_udata = new_thread;
200                             new_thread->tcb->tcb_tmbx.tm_context.uc_sigmask =
201                                 new_thread->sigmask;
202                             new_thread->tcb->tcb_tmbx.tm_context.uc_stack.ss_size =
203                                 new_thread->attr.stacksize_attr;
204                             new_thread->tcb->tcb_tmbx.tm_context.uc_stack.ss_sp =
205                                 new_thread->attr.stackaddr_attr;
206                             makecontext(&new_thread->tcb->tcb_tmbx.tm_context,

So, the page fault was really the SIGILL. This indicates that the
internal consistency is reliable. Yet, our newly created thread
structure is clobbered:

(gdb) l 193
188
189                             /*
190                              * Initialize the machine context.
191                              * Enter a critical region to get consistent context.
192                              */
193                             crit = _kse_critical_enter();
194                             THR_GETCONTEXT(&new_thread->tcb->tcb_tmbx.tm_context);
195                             /* Initialize the thread for signals: */
196                             new_thread->sigmask = curthread->sigmask;
197                             _kse_critical_leave(crit);

This suddenly looks very suspicious...

(gdb) p /x curthread->sigmask
$17 = {__bits = {0xfffefeff, 0xffffffff, 0xffffffff, 0xffffffff}}
(gdb) p /x *new_thread
$18 = {tcb = 0xfffffffffffefeff, magic = 0xffffffff, name = 0x0,
  uniqueid = 0x0, tle = {tqe_next = 0x9fffffffffffeaa0, tqe_prev = 0x0},

-- 
 Marcel Moolenaar	  USPA: A-39004		 marcel at xcllnt.net