timeout/untimeout race conditions/crash [patch]

John Baldwin jhb at freebsd.org
Mon Mar 17 09:44:10 PDT 2008


On Friday 14 March 2008 10:41:14 pm Alfred Perlstein wrote:
> We think we tracked down a defect in timeout/untimeout in
> FreeBSD.
> 
> We have reduced the problem to the following scenario:
> 
> 2+ cpu system, one cpu is running softclock at the same time
> another thread is running on another cpu which makes use of
> timeout/untimeout.
> 
> CPU 0 is running "softclock"
> CPU 1 is running "driver" with Giant held.
> 
> softclock: mtx_lock_spin(&callout_lock)
> softclock: CACHES the callout structure's fields.
> softclock: sees that it's a CALLOUT_LOCAL_ALLOC
> softclock: executes this code:
>   if (c->c_flags & CALLOUT_LOCAL_ALLOC) {
>   	c->c_func = NULL;
>   	c->c_flags = CALLOUT_LOCAL_ALLOC;
>   	SLIST_INSERT_HEAD(&callfree, c,
>   		c_links.sle);
>   	curr_callout = NULL;
>   } else {
>   
>   NOTE: that c->c_func has been set to NULL and curr_callout
>         is also NULL.
> softclock: mtx_unlock_spin(&callout_lock)
> driver: calls untimeout(), the following sequence happens:
>         mtx_lock_spin(&callout_lock);
>         if (handle.callout->c_func == ftn && handle.callout->c_arg == arg)
>                 callout_stop(handle.callout);
>         mtx_unlock_spin(&callout_lock);
> 
>   NOTE: untimeout() sees that handle.callout->c_func is not set
>         to the function so it does NOT call callout_stop(9)!
> driver: free's backing structure for c->c_arg.
> softclock: executes callout.
> softclock: likely crashes at this point due to access after free.
> 
> I have a patch I'm trying out here, but I need feedback on it.
> 
> The way the patch works is to treat CALLOUT_LOCAL_ALLOC (timeout/untimeout)
> callouts the same as ~CALLOUT_LOCAL_ALLOC allocs, and moves the
> freelist manipulation to the end of the callout dispatch.
> 
> Some light testing seems to have the system work.
> 
> We are doing some testing in-house to also make sure this works.
> 
> Please provide feedback.
> 
> See attached delta.

This is not a bug.  Don't use untimeout(9) as it is not guaranteed to be 
reliable.  Instead, use callout_*().  Your patch doesn't solve any races as 
the driver detach routine needs to use callout_drain() and not just 
callout_stop/untimeout anyways.  Fix your broken drivers.

-- 
John Baldwin


More information about the freebsd-stable mailing list