debugging frequent kernel panics on 8.2-RELEASE

Andriy Gapon avg at FreeBSD.org
Sat Aug 20 10:10:55 UTC 2011


on 20/08/2011 13:02 Andriy Gapon said the following:
> on 18/08/2011 02:15 Steven Hartland said the following:
>> In a nutshell the jail manager we're using will attempt to resurrect the jail
>> from a dieing state in a few specific scenarios.
>>
>> Here's an exmaple:-
>> 1. jail restart requested
>> 2. jail is stopped, so the java processes is killed off, but active tcp sessions
>> may prevent the timely full shutdown of the jail.
>> 3. if an existing jail is detected, i.e. a dieing jail from #2, instead of
>> starting a new jail we attach to the old one and exec the new java process.
>> 4. if an existing jail isnt detected, i.e. where there where not hanging tcp
>> sessions and #2 cleanly shutdown the jail, a new jail is created, attached to
>> and the java exec'ed.
>>
>> The system uses static jailid's so its possible to determine if an existing
>> jail for this "service" exists or not. This prevents duplicate services as
>> well as making services easy to identify by their jailid.
>>
>> So what we could be seeing is a race between the jail shutdown and the attach
>> of the new process?
> 
> Not a jail expert at all, but a few suggestions...
> 
> First, wouldn't the 'persist' jail option simplify your life a little bit?
> 
> Second, you may want to try to monitor value of prison0.pr_uref variable (e.g.
> via kgdb) while executing various scenarios of what you do now.  If after
> finishing a certain scenario you end up with a value lower than at the start of
> scenario, then this is the troublesome one.
> Please note that prison0.pr_uref is composed from a number of non-jailed
> processes plus a number of top-level jails.  So take this into account when
> comparing prison0.pr_uref values - it's better to record the initial value when
> no jails are started and it's important to keep the number of non-jailed
> processes the same (or to account for its changes).

BTW, I suspect the following scenario, but I am not able to verify it either via
testing or in the code:
- last process in a dying jail exits
- pr_uref of the jail reaches zero
- pr_uref of prison0 gets decremented
- you attach to the jail and resurrect it
- but pr_uref of prison0 stays decremented

Repeat this enough times and prison0.pr_uref reaches zero.
To reach zero even sooner just kill enough of non-jailed processes.

-- 
Andriy Gapon


More information about the freebsd-stable mailing list