request for review, callout fix.

John Baldwin jhb at freebsd.org
Fri Mar 21 20:55:12 UTC 2008


On Friday 21 March 2008 04:30:39 pm Alfred Perlstein wrote:
> * John Baldwin <jhb at freebsd.org> [080321 12:08] wrote:
> > On Friday 21 March 2008 12:32:25 am Alfred Perlstein wrote:
> > > Hi guys, attached is a fix for the timeout/untimeout race with
> > > Giant locked code.
> > > 
> > > Basically the old code would make the callout inaccessable
> > > right before calling it inside of softclock.
> > > 
> > > However only the callout lock is held, so when switching to
> > > the callout's associated mutex (in this case Giant) there's
> > > a race where a "local" (timeout/untimeout) callout would be
> > > fired even if stopped.
> > > 
> > > This patch fixes that.  We've run several hours of regression
> > > testing on a version of this for 6.x.
> > > 
> > > People internal to Juniper and iedowse@ helped with this.
> > > 
> > > Please review/comment.
> > 
> > Curious as to how c->c_flags could change if CALLOUT_LOCAL_ALLOC is set?  
> > Since it hasn't been enqueued on the callfree list, it isn't visible to 
any 
> > other code, so nothing should be able to mark it active or pending.  IOW, 
you 
> > should be able to do this in your second hunk:
> > 
> > 	if (c_flags & CALLOUT_LOCAL_ALLOC) {
> > 		KASSERT(c->c_flags == CALLOUT_LOCAL_ALLOC, ("corrupted callout"));
> > 		c->c_func = NULL;
> > 		SLIST_INSERT_HEAD(...);
> > 	}
> 
> It's more hairy than that.
> 
> Actually, I think you're right...
> 
> The confusion is the race for "callout_stop_safe",
> but now that I think about it, the softclock will only
> "grab" this callout if untimeout has NOT yet been called.
>
> If untimeout HAS been called, then softclock won't even see
> the callout.

Yes.

> If untimeout HAS NOT been called and softclock grabs the
> callout, it will have cleared CALLOUT_PENDING and then
> untimeout (callout_stop_safe) will no longer free it.

Yes.
 
> Therefore it is safe to omit the check for flags as you
> suggest.
>
> Is that right?

Yes, I believe so.  The tricky case is the race you are originally trying to 
fix which is that the untimeout() comes in after the spin lock is dropped but 
before Giant is acquired, but that falls under your second case above.

-- 
John Baldwin


More information about the freebsd-smp mailing list