proposed smp_rendezvous change

Mon May 16 21:38:57 UTC 2011

On Monday 16 May 2011 16:46:03 John Baldwin wrote:
> On Monday, May 16, 2011 4:30:44 pm Max Laier wrote:
> > On Monday 16 May 2011 14:21:27 John Baldwin wrote:
> > > Yes, we need to fix that.  Humm, it doesn't preempt when you do a
> > > critical_exit() though?  Or do you use a hand-rolled critical exit that
> > > doesn't do a deferred preemption?
> > 
> > Right now I just did a manual td_critnest++/--, but I guess ...
> 
> Ah, ok, so you would "lose" a preemption.  That's not really ideal.
> 
> > > Actually, I'm curious how the spin unlock inside the IPI could yield
> > > the CPU.  Oh, is rmlock doing a wakeup inside the IPI handler?  I
> > > guess that is ok as long as the critical_exit() just defers the
> > > preemption to the end of the IPI handler.
> > 
> > ... the earliest point where it is safe to preempt is after doing the
> > 
> >    atomic_add_int(&smp_rv_waiters[2], 1);
> > 
> > so that we can start other IPIs again.  However, since we don't accept
> > new IPIs until we signal EOI in the MD code (on amd64), this might still
> > not be a good place to do the yield?!?
> 
> Hmm, yeah, you would want to do the EOI before you yield.  However, we
> could actually move the EOI up before calling the MI code so long as we
> leave interrupts disabled for the duration of the handler (which we do).
> 
> > The spin unlock boils down to a critical_exit() and unless we did a
> > critical_enter() at some point during the redenvouz setup, we will
> > yield() if we owepreempt.  I'm not quite sure how that can happen, but
> > it seems like there is a path that allows the scheduler to set it from a
> > foreign CPU.
> 
> No, it is only set on curthread by curthread.  This is actually my main
> question.  I've no idea how this could happen unless the rmlock code is
> actually triggering a wakeup or sched_add() in its rendezvous handler.
> 
> I don't see anything in rm_cleanIPI() that would do that however.
> 
> I wonder if your original issue was really fixed  just by the first patch
> you had which fixed the race in smp_rendezvous()?

I found the stack that lead me to this patch in the first place:

#0  sched_switch (td=0xffffff011a970000, newtd=0xffffff006e6784b0, 
flags=4) at src/sys/kern/sched_ule.c:1939
#1  0xffffffff80285c7f in mi_switch (flags=6, newtd=0x0) at 
src/sys/kern/kern_synch.c:475
#2  0xffffffff802a2cb3 in critical_exit () at src/sys/kern/kern_switch.c:185
#3  0xffffffff80465807 in spinlock_exit () at 
src/sys/amd64/amd64/machdep.c:1458
#4  0xffffffff8027adea in rm_cleanIPI (arg=<value optimized out>) at 
src/sys/kern/kern_rmlock.c:180
#5  0xffffffff802b9887 in smp_rendezvous_action () at 
src/sys/kern/subr_smp.c:402
#6  0xffffffff8045e2a4 in Xrendezvous () at 
src/sys/amd64/amd64/apic_vector.S:235
#7  0xffffffff802a2c6e in critical_exit () at src/sys/kern/kern_switch.c:179
#8  0xffffffff804365ba in uma_zfree_arg (zone=0xffffff009ff4b5a0, 
item=0xffffff000f34cd40, udata=0xffffff000f34ce08) at 
src/sys/vm/uma_core.c:2370
.
.
.

and now that I look at it again, it is clear that critical_exit() just isn't 
interrupt safe.  I'm not sure how to fix that, yet ... but this:

        if (td->td_critnest == 1) {
                td->td_critnest = 0;
                if (td->td_owepreempt) {
                        td->td_critnest = 1;

clearly doesn't work.

Max