timeout/untimeout race conditions/crash [patch]
John Baldwin
jhb at freebsd.org
Mon Mar 17 09:44:10 PDT 2008
On Friday 14 March 2008 10:41:14 pm Alfred Perlstein wrote:
> We think we tracked down a defect in timeout/untimeout in
> FreeBSD.
>
> We have reduced the problem to the following scenario:
>
> 2+ cpu system, one cpu is running softclock at the same time
> another thread is running on another cpu which makes use of
> timeout/untimeout.
>
> CPU 0 is running "softclock"
> CPU 1 is running "driver" with Giant held.
>
> softclock: mtx_lock_spin(&callout_lock)
> softclock: CACHES the callout structure's fields.
> softclock: sees that it's a CALLOUT_LOCAL_ALLOC
> softclock: executes this code:
> if (c->c_flags & CALLOUT_LOCAL_ALLOC) {
> c->c_func = NULL;
> c->c_flags = CALLOUT_LOCAL_ALLOC;
> SLIST_INSERT_HEAD(&callfree, c,
> c_links.sle);
> curr_callout = NULL;
> } else {
>
> NOTE: that c->c_func has been set to NULL and curr_callout
> is also NULL.
> softclock: mtx_unlock_spin(&callout_lock)
> driver: calls untimeout(), the following sequence happens:
> mtx_lock_spin(&callout_lock);
> if (handle.callout->c_func == ftn && handle.callout->c_arg == arg)
> callout_stop(handle.callout);
> mtx_unlock_spin(&callout_lock);
>
> NOTE: untimeout() sees that handle.callout->c_func is not set
> to the function so it does NOT call callout_stop(9)!
> driver: free's backing structure for c->c_arg.
> softclock: executes callout.
> softclock: likely crashes at this point due to access after free.
>
> I have a patch I'm trying out here, but I need feedback on it.
>
> The way the patch works is to treat CALLOUT_LOCAL_ALLOC (timeout/untimeout)
> callouts the same as ~CALLOUT_LOCAL_ALLOC allocs, and moves the
> freelist manipulation to the end of the callout dispatch.
>
> Some light testing seems to have the system work.
>
> We are doing some testing in-house to also make sure this works.
>
> Please provide feedback.
>
> See attached delta.
This is not a bug. Don't use untimeout(9) as it is not guaranteed to be
reliable. Instead, use callout_*(). Your patch doesn't solve any races as
the driver detach routine needs to use callout_drain() and not just
callout_stop/untimeout anyways. Fix your broken drivers.
--
John Baldwin
More information about the freebsd-stable
mailing list