debugging frequent kernel panics on 8.2-RELEASE
Andriy Gapon
avg at FreeBSD.org
Sat Aug 20 20:34:56 UTC 2011
on 20/08/2011 23:24 Steven Hartland said the following:
> ----- Original Message ----- From: "Steven Hartland" <killing at multiplay.co.uk>
>> Looking through the code I believe I may have noticed a scenario which could
>> trigger the problem.
>>
>> Given the following code:-
>>
>> static void
>> prison_deref(struct prison *pr, int flags)
>> {
>> struct prison *ppr, *tpr;
>> int vfslocked;
>>
>> if (!(flags & PD_LOCKED))
>> mtx_lock(&pr->pr_mtx);
>> /* Decrement the user references in a separate loop. */
>> if (flags & PD_DEUREF) {
>> for (tpr = pr;; tpr = tpr->pr_parent) {
>> if (tpr != pr)
>> mtx_lock(&tpr->pr_mtx);
>> if (--tpr->pr_uref > 0)
>> break;
>> KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
>> mtx_unlock(&tpr->pr_mtx);
>> }
>> /* Done if there were only user references to remove. */
>> if (!(flags & PD_DEREF)) {
>> mtx_unlock(&tpr->pr_mtx);
>> if (flags & PD_LIST_SLOCKED)
>> sx_sunlock(&allprison_lock);
>> else if (flags & PD_LIST_XLOCKED)
>> sx_xunlock(&allprison_lock);
>> return;
>> }
>> if (tpr != pr) {
>> mtx_unlock(&tpr->pr_mtx);
>> mtx_lock(&pr->pr_mtx);
>> }
>> }
>>
>> If you take a scenario of a simple one level prison setup running a single
>> process
>> where a prison has just been stopped.
>>
>> In the above code pr_uref of the processes prison is decremented. As this is the
>> last process then pr_uref will hit 0 and the loop continues instead of breaking
>> early.
>>
>> Now at the end of the loop iteration the mtx is unlocked so other process can
>> now manipulate the jail, this is where I think the problem may be.
>>
>> If we now have another process come in and attach to the jail but then instantly
>> exit, this process may allow another kernel thread to hit this same bit of code
>> and so two process for the same prison get into the section which decrements
>> prison0's pr_uref, instead of only one.
>>
>> In essence I think we can get the following flow where 1# = process1
>> and 2# = process2
>> 1#1. prison1.pr_uref = 1 (single process jail)
>> 1#2. prison_deref( prison1,...
>> 1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
>> 1#3. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
>> 1#3. prison0.pr_uref--
>> 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
>> 2#2. process1.exit
>> 2#3. prison_deref( prison1,...
>> 2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
>> 2#5. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
>> 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by prison1)
>>
>> It seems like the action on the parent prison to decrement the pr_uref is
>> happening too early, while the jail can still be used and without the lock on
>> the child jails mtx, so causing a race condition.
>>
>> I think the fix is to the move the decrement of parent prison pr_uref's down
>> so it only takes place if the jail is "really" being removed. Either that or
>> to change the locking semantics so that once the lock is aquired in this
>> prison_deref its not unlocked until the function completes.
>>
>> What do people think?
>
> After reviewing the changes to prison_deref in commit which added hierarchical
> jails, the removal of the lock by the inital loop on the passed in prison may
> be unintentional.
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h
>
>
> If so the following may be all that's needed to fix this issue:-
>
> diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
> --- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100
> +++ sys/kern/kern_jail.c 2011-08-20 21:18:35.307201425 +0100
> @@ -2455,7 +2455,8 @@
> if (--tpr->pr_uref > 0)
> break;
> KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
> - mtx_unlock(&tpr->pr_mtx);
> + if (tpr != pr)
> + mtx_unlock(&tpr->pr_mtx);
> }
> /* Done if there were only user references to remove. */
> if (!(flags & PD_DEREF)) {
Not sure if this would fly as is - please double check the later block where
pr->pr_mtx is re-locked.
--
Andriy Gapon
More information about the freebsd-jail
mailing list