Simpler SMP-friendly callout/timeout semantics
Ian Dowse
iedowse at maths.tcd.ie
Wed Nov 17 15:22:58 PST 2004
The current callout/timeout API supports Giant-locked and MP-safe
usage options, but in both cases it is very difficult to use the
API in a way that is free of race conditions and complications.
Below is an attempt to extend that API to offer much simpler semantics
to many users while not limiting its usefulness or performance.
A summary of the difficulties faced when using the current callout
API can be found at:
http://people.freebsd.org/~iedowse/callout/callout_current.txt
This subject has been discussed a few times before (e.g, search for
"Timeout and SMP race"). One of the suggestions, made by Archie
Cobbs, was to supply a mutex when the callout is created and have
the callout code use that to avoid many of the races. This is what
I've attempted to do. The patch below adds a new API function:
callout_init_mtx(struct callout *c, struct mtx *mtx, int flags);
If a non-NULL `mtx' is supplied, then the callout system will acquire
that mutex before invoking the callout handler. Normally the handler
must return with the mutex still locked, but if the CALLOUT_RETURNUNLOCKED
flag is supplied, then the handler should unlock the mutex itself
before returning.
Since the callout system is acquiring the mutex itself, it can check
if the callout has been cancelled while holding the mutex and before
invoking the handler or dereferencing the callout function pointer
or argument. When combined with a requirement of holding the mutex
during calls to callout_stop() and callout_reset(), this entirely
eliminates the races, so callout_stop() and callout_reset() reliably
do what you'd expect.
The semantics of callouts initialised with the new function are:
o When calling callout_stop() or callout_reset() on the callout,
the specified mutex must be held.
o Calls to callout_stop() and callout_reset() guarantee that the
handler will not be invoked even if it has already been de-queued
by softclock(). There are no races since the mutex is held by
the caller of callout_stop() or callout_reset(). When the
CALLOUT_RETURNUNLOCKED flag is used, it is however possible that
the handler function is still running, but the because the caller
holds the mutex, it is known that the handler, if active, has
at least got past the mtx_unlock(). These functions do not sleep.
o Before destroying the specified mutex, callout_drain() must be
called. Since callout_drain() may sleep, the mutex may not be
held at this time. If the callout has not been cancelled already,
then it might be invoked while callout_drain() executes. To avoid
this, call callout_stop() before releasing the mutex and then
call callout_drain() afterwards. The call to callout_drain() is
necessary to ensure that softclock() no longer references the
mutex (e.g. it might still be waiting to acquire it).
The patch also changes the semantics of all existing Giant-locked
callouts to match the above with a mutex of Giant implicitly
specified. This means that Giant-locked users MUST hold Giant while
calling callout_stop(), callout_reset(), timeout() and untimeout(),
which will require a number of changes in various parts of the
kernel. It also means that fully Giant-locked code gets the race-free
semantics with no source changes.
The new API function handles some cases very well:
- Code that uses just one mutex per instance, such as many device
drivers.
- Giant-locked code because again there is just one mutex involved.
However there are more complex uses of the callout API that currently
use more than one mutex due to lock ordering requirements. A good
example is the TCP code, where the timeout routines can acquire a
global TCP mutex before the per-connection mutex, so this new scheme
on its own cannot be used directly. Maybe it is possible to rework
the TCP code to avoid this, or even extend the callout API further
so that an alternative locking routine can be supplied if that
helps.
Do people think that a change like this is worth attempting? Are
there further improvements that could be made or any other suggestions
or comments?
Ian
Additional stuff:
Patch to make the "re" driver use the new API (currently it uses
timeout/untimeout without locking Giant):
http://people.freebsd.org/~iedowse/callout/callout_re.diff
Patch that attempts to close a number of timeout races in the
ATA code using the new API:
http://people.freebsd.org/~iedowse/callout/callout_ata.diff
Finally the callout patch itself:
Index: sys/kern/kern_timeout.c
===================================================================
RCS file: /dump/FreeBSD-CVS/src/sys/kern/kern_timeout.c,v
retrieving revision 1.91
diff -u -r1.91 kern_timeout.c
--- sys/kern/kern_timeout.c 6 Aug 2004 21:49:00 -0000 1.91
+++ sys/kern/kern_timeout.c 17 Nov 2004 02:31:28 -0000
@@ -53,6 +53,9 @@
static int avg_gcalls;
SYSCTL_INT(_debug, OID_AUTO, to_avg_gcalls, CTLFLAG_RD, &avg_gcalls, 0,
"Average number of Giant callouts made per softclock call. Units = 1/1000");
+static int avg_mtxcalls;
+SYSCTL_INT(_debug, OID_AUTO, to_avg_mtxcalls, CTLFLAG_RD, &avg_mtxcalls, 0,
+ "Average number of mtx callouts made per softclock call. Units = 1/1000");
static int avg_mpcalls;
SYSCTL_INT(_debug, OID_AUTO, to_avg_mpcalls, CTLFLAG_RD, &avg_mpcalls, 0,
"Average number of MP callouts made per softclock call. Units = 1/1000");
@@ -80,6 +83,12 @@
* If curr_callout is non-NULL, threads waiting on
* callout_wait will be woken up as soon as the
* relevant callout completes.
+ * curr_cancelled - Changing to 1 with both callout_lock and c_mtx held
+ * guarantees that the current callout will not run.
+ * The softclock() function sets this to 0 before it
+ * drops callout_lock to acquire c_mtx, and it calls
+ * the handler only if curr_cancelled still 0 when
+ * c_mtx is successfully acquired.
* wakeup_ctr - Incremented every time a thread wants to wait
* for a callout to complete. Modified only when
* curr_callout is non-NULL.
@@ -88,6 +97,7 @@
* cutt_callout is non-NULL.
*/
static struct callout *curr_callout;
+static int curr_cancelled;
static int wakeup_ctr;
static int wakeup_needed;
@@ -181,6 +191,7 @@
int steps; /* #steps since we last allowed interrupts */
int depth;
int mpcalls;
+ int mtxcalls;
int gcalls;
int wakeup_cookie;
#ifdef DIAGNOSTIC
@@ -195,6 +206,7 @@
#endif /* MAX_SOFTCLOCK_STEPS */
mpcalls = 0;
+ mtxcalls = 0;
gcalls = 0;
depth = 0;
steps = 0;
@@ -225,12 +237,14 @@
} else {
void (*c_func)(void *);
void *c_arg;
+ struct mtx *c_mtx;
int c_flags;
nextsoftcheck = TAILQ_NEXT(c, c_links.tqe);
TAILQ_REMOVE(bucket, c, c_links.tqe);
c_func = c->c_func;
c_arg = c->c_arg;
+ c_mtx = c->c_mtx;
c_flags = c->c_flags;
c->c_func = NULL;
if (c->c_flags & CALLOUT_LOCAL_ALLOC) {
@@ -242,11 +256,32 @@
(c->c_flags & ~CALLOUT_PENDING);
}
curr_callout = c;
+ curr_cancelled = 0;
mtx_unlock_spin(&callout_lock);
- if (!(c_flags & CALLOUT_MPSAFE)) {
- mtx_lock(&Giant);
- gcalls++;
- CTR1(KTR_CALLOUT, "callout %p", c_func);
+ if (c_mtx != NULL) {
+ mtx_lock(c_mtx);
+ /*
+ * The callout may have been cancelled
+ * while we switched locks.
+ */
+ if (curr_cancelled) {
+ mtx_unlock(c_mtx);
+ mtx_lock_spin(&callout_lock);
+ goto done_locked;
+ }
+ /* The callout cannot be stopped now. */
+ curr_cancelled = 1;
+
+ if (c_mtx == &Giant) {
+ gcalls++;
+ CTR1(KTR_CALLOUT, "callout %p",
+ c_func);
+ } else {
+ mtxcalls++;
+ CTR1(KTR_CALLOUT,
+ "callout mtx %p",
+ c_func);
+ }
} else {
mpcalls++;
CTR1(KTR_CALLOUT, "callout mpsafe %p",
@@ -275,9 +310,10 @@
lastfunc = c_func;
}
#endif
- if (!(c_flags & CALLOUT_MPSAFE))
- mtx_unlock(&Giant);
+ if ((c_flags & CALLOUT_RETURNUNLOCKED) == 0)
+ mtx_unlock(c_mtx);
mtx_lock_spin(&callout_lock);
+done_locked:
curr_callout = NULL;
if (wakeup_needed) {
/*
@@ -300,6 +336,7 @@
}
avg_depth += (depth * 1000 - avg_depth) >> 8;
avg_mpcalls += (mpcalls * 1000 - avg_mpcalls) >> 8;
+ avg_mtxcalls += (mtxcalls * 1000 - avg_mtxcalls) >> 8;
avg_gcalls += (gcalls * 1000 - avg_gcalls) >> 8;
nextsoftcheck = NULL;
mtx_unlock_spin(&callout_lock);
@@ -395,15 +432,26 @@
void *arg;
{
+ if (c->c_mtx != NULL)
+ mtx_assert(c->c_mtx, MA_OWNED);
+
mtx_lock_spin(&callout_lock);
- if (c == curr_callout && wakeup_needed) {
+ if (c == curr_callout) {
/*
* We're being asked to reschedule a callout which is
- * currently in progress, and someone has called
- * callout_drain to kill that callout. Don't reschedule.
+ * currently in progress. If there is a mutex then we
+ * can cancel the callout if it has not really started.
*/
- mtx_unlock_spin(&callout_lock);
- return;
+ if (c->c_mtx != NULL && !curr_cancelled)
+ curr_cancelled = 1;
+ if (wakeup_needed) {
+ /*
+ * Someone has called callout_drain to kill this
+ * callout. Don't reschedule.
+ */
+ mtx_unlock_spin(&callout_lock);
+ return;
+ }
}
if (c->c_flags & CALLOUT_PENDING) {
if (nextsoftcheck == c) {
@@ -446,13 +494,20 @@
{
int wakeup_cookie;
+ if (!safe && c->c_mtx != NULL)
+ mtx_assert(c->c_mtx, MA_OWNED);
+
mtx_lock_spin(&callout_lock);
/*
* Don't attempt to delete a callout that's not on the queue.
*/
if (!(c->c_flags & CALLOUT_PENDING)) {
c->c_flags &= ~CALLOUT_ACTIVE;
- if (c == curr_callout && safe) {
+ if (c != curr_callout) {
+ mtx_unlock_spin(&callout_lock);
+ return (0);
+ }
+ if (safe) {
/* We need to wait until the callout is finished. */
wakeup_needed = 1;
wakeup_cookie = wakeup_ctr++;
@@ -468,6 +523,11 @@
cv_wait(&callout_wait, &callout_wait_lock);
mtx_unlock(&callout_wait_lock);
+ } else if (c->c_mtx != NULL && !curr_cancelled) {
+ /* We can stop the callout before it runs. */
+ curr_cancelled = 1;
+ mtx_unlock_spin(&callout_lock);
+ return (1);
} else
mtx_unlock_spin(&callout_lock);
return (0);
@@ -493,8 +553,29 @@
int mpsafe;
{
bzero(c, sizeof *c);
- if (mpsafe)
- c->c_flags |= CALLOUT_MPSAFE;
+ if (mpsafe) {
+ c->c_mtx = NULL;
+ c->c_flags = CALLOUT_RETURNUNLOCKED;
+ } else {
+ c->c_mtx = &Giant;
+ c->c_flags = 0;
+ }
+}
+
+void
+callout_init_mtx(c, mtx, flags)
+ struct callout *c;
+ struct mtx *mtx;
+ int flags;
+{
+ bzero(c, sizeof *c);
+ c->c_mtx = mtx;
+ KASSERT((flags & ~CALLOUT_RETURNUNLOCKED) == 0,
+ ("callout_init_mtx: bad flags %d", flags));
+ /* CALLOUT_RETURNUNLOCKED makes no sense without a mutex. */
+ KASSERT(mtx != NULL || (flags & CALLOUT_RETURNUNLOCKED) == 0,
+ ("callout_init_mtx: CALLOUT_RETURNUNLOCKED with no mutex"));
+ c->c_flags = flags & CALLOUT_RETURNUNLOCKED;
}
#ifdef APM_FIXUP_CALLTODO
Index: sys/sys/callout.h
===================================================================
RCS file: /dump/FreeBSD-CVS/src/sys/sys/callout.h,v
retrieving revision 1.27
diff -u -r1.27 callout.h
--- sys/sys/callout.h 20 Apr 2004 15:49:31 -0000 1.27
+++ sys/sys/callout.h 17 Nov 2004 01:12:01 -0000
@@ -40,6 +40,8 @@
#include <sys/queue.h>
+struct mtx;
+
SLIST_HEAD(callout_list, callout);
TAILQ_HEAD(callout_tailq, callout);
@@ -51,6 +53,7 @@
int c_time; /* ticks to the event */
void *c_arg; /* function argument */
void (*c_func)(void *); /* function to call */
+ struct mtx *c_mtx; /* mutex to lock */
int c_flags; /* state of this entry */
};
@@ -58,6 +61,7 @@
#define CALLOUT_ACTIVE 0x0002 /* callout is currently active */
#define CALLOUT_PENDING 0x0004 /* callout is waiting for timeout */
#define CALLOUT_MPSAFE 0x0008 /* callout handler is mp safe */
+#define CALLOUT_RETURNUNLOCKED 0x0010 /* handler returns with mtx unlocked */
struct callout_handle {
struct callout *callout;
@@ -75,6 +79,7 @@
#define callout_deactivate(c) ((c)->c_flags &= ~CALLOUT_ACTIVE)
#define callout_drain(c) _callout_stop_safe(c, 1)
void callout_init(struct callout *, int);
+void callout_init_mtx(struct callout *, struct mtx *, int);
#define callout_pending(c) ((c)->c_flags & CALLOUT_PENDING)
void callout_reset(struct callout *, int, void (*)(void *), void *);
#define callout_stop(c) _callout_stop_safe(c, 0)
More information about the freebsd-arch
mailing list