kern/57945: [patch] Add locking to kqueue to make it MP-safe

Mon Oct 13 10:57:58 PDT 2003

On Mon, 13 Oct 2003, Stefan Farfeleder wrote:

> >Description:
> The current kqueue implementation does not seem to be MP-safe.  The
> kqueue facility does not having its own locks, I believe it was
> intended to rely on Giant.  As Giant gets locked down, kqueue was
> apparently forgotten.  The KNOTE() macro is spread over the whole
> kernel, sometimes called with Giant hold, sometimes not.  The function
> knote_enqueue() (called via knote() and KNOTE_ACTIVATE()) inserts a node
> into a TAILQ, this operation isn't atomic.  If another processor
> executes KNOTE() or kevent() at the same time, the queue might get
> corrupted.
> ...
> >Fix:
> I'm using patches similar to this one since about a year, the
> kqueue-enabled make(1) runs without problems with it.  Unfortunately the
> mutex is (I'm not sure it has to be) acquired and released rather often
> in the loop in kqueue_scan() to avoid deadlocks due to LORs.  Maybe a
> spin lock would increase the performance?  (I didn't do any benchmarks.)

The small critical sections seem to be mostly required.

Spin mutexes are actually less efficient, since in the usual uncontested
case they do the same things plus extra calls to critical_enter() and
critical exit.

> --- kqueue_lock.diff begins here ---
> Index: src/sys/kern/kern_event.c
> ===================================================================
> RCS file: /usr/home/ncvs/src/sys/kern/kern_event.c,v
> retrieving revision 1.60
> diff -u -r1.60 kern_event.c
> --- src/sys/kern/kern_event.c	18 Jun 2003 18:16:39 -0000	1.60
> +++ src/sys/kern/kern_event.c	13 Oct 2003 00:35:42 -0000
> ...
> @@ -704,15 +705,16 @@
>
>  start:
>  	kevp = kq->kq_kev;
> -	s = splhigh();
> +	mtx_lock(&kq->kq_mtx);
>  	if (kq->kq_count == 0) {
>  		if (timeout < 0) {
>  			error = EWOULDBLOCK;
> +			mtx_unlock(&kq->kq_mtx);
>  		} else {
>  			kq->kq_state |= KQ_SLEEP;
> -			error = tsleep(kq, PSOCK | PCATCH, "kqread", timeout);
> +			error = msleep(kq, &kq->kq_mtx, PSOCK | PCATCH | PDROP,
> +			    "kqread", timeout);
>  		}
> -		splx(s);

Not copying the spl locking seems to give a bug here.  msleep() returns with
the mutex held.

>  		if (error == 0)
>  			goto retry;
>  		/* don't restart after signals... */
> @@ -728,20 +730,22 @@
>  		kn = TAILQ_FIRST(&kq->kq_head);
>  		TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe);
>  		if (kn == &marker) {
> -			splx(s);
> +			mtx_unlock(&kq->kq_mtx);
>  			if (count == maxevents)
>  				goto retry;
>  			goto done;
>  		}
> +		kq->kq_count--;
> +		mtx_unlock(&kq->kq_mtx);
>  		if (kn->kn_status & KN_DISABLED) {
>  			kn->kn_status &= ~KN_QUEUED;
> -			kq->kq_count--;
> +			mtx_lock(&kq->kq_mtx);
>  			continue;
>  		}

I don't see any reason to unlock/lock here.  There were no splx()/splhigh()'s.
splhigh() is very global but kq_mtx is very local so keeping the
critical sections short should be much less needed than before.

>  		if ((kn->kn_flags & EV_ONESHOT) == 0 &&
>  		    kn->kn_fop->f_event(kn, 0) == 0) {
>  			kn->kn_status &= ~(KN_QUEUED | KN_ACTIVE);
> -			kq->kq_count--;
> +			mtx_lock(&kq->kq_mtx);
>  			continue;
>  		}
>  		*kevp = kn->kn_kevent;

Here dropping the mutex seems to be be necessary for calling the
function, but the function call was locked by splhigh() before.

BTW, what locks kn?

Bruce