Jail stuck in dying

James Gritton jamie at freebsd.org
Thu Mar 9 15:19:05 UTC 2017


On 2017-03-08 19:01, Kristof Provost wrote:
> Hi,
> 
> On a current box (r314933) I can’t seem to stop my jails.
> 
> It’s started like this:
> 
> 	jail -c name=test0 host.hostname=test vnet persist 
> vnet.interface=epair0b \
> 		path=/usr/jails/jail1 exec.start="/bin/sh /etc/rc"
> 
> I terminate it with `jail -R test0`, yet the jail stays stuck in dying 
> state:
> 
> 	$ jls -a
> 	   JID  IP Address      Hostname                      Path
>      1                  test                          /usr/jails/jail1
> 
> 	$ jls -na
> 	devfs_ruleset=0 dying enforce_statfs=2 host=new ip4=inherit
> ip6=inherit jid=1 name=test0 osreldate=1200023 osrelease=12.0-CURRENT
> parent=0 path=/usr/jails/jail1 nopersist securelevel=-1
> sysvmsg=disable sysvsem=disable sysvshm=disable vnet=new
> allow.nochflags allow.nomount allow.mount.nodevfs
> allow.mount.nofdescfs allow.mount.nolinprocfs allow.mount.nolinsysfs
> allow.mount.nonullfs allow.mount.noprocfs allow.mount.notmpfs
> allow.mount.nozfs allow.noquotas allow.noraw_sockets
> allow.set_hostname allow.nosocket_af allow.nosysvipc children.cur=0
> children.max=0 cpuset.id=2 host.domainname="" host.hostid=0
> host.hostname=test host.hostuuid=00000000-0000-0000-0000-000000000000
> ip4.addr= ip4.saddrsel ip6.addr= ip6.saddrsel
> 
> I’ve tried debugging this, but the most I can say is that there
> appears to be something wrong with the reference counting.
> prison_deref() returns because pr->pr_ref is 3. There are no more
> jailed processes running, so I have no idea why this happens.
> 
> The problem doesn’t appear to be related to vnet. I can reproduce it
> without setting the vnet flag when creating the jail as well.

It's never about processes - dying jails are those that have no 
processes but still have other references to them in the kernel.  Those 
"other references" are almost always through the cred system: crcopy() 
calls prison_hold() to increase the associated prison's pf_ref, and 
crfree() calls prison_free().

The hard part is tracking down just what might be holding such a 
credential; there is nothing that points from a prison to its associated 
creds.  But it's almost always associated with the network stack.  In 
normal operation, a jail may stick around for a little while in the 
dying state until its just-closed TCP connections time out.  I've heard 
report of NFS mounts associated with jails sometimes causing such 
references that don't go away.  There are of course many other places 
that use creds, but most of them are in some way associated with 
processes.

As long as the problem only manifests as jails in the dying list, the 
easiest solution is to ignore it.  As long as you don't have hard-coded 
JIDs in jail.conf, you can re-create the jail and it will get a 
different JID.  The need for a particular JID isn't what it used to be, 
and chances are you can get away with dynamically numbered jails.  If 
you want to track down what's holding the jail half-alive, it becomes a 
matter of find out exactly what (probably network) resources a jail is 
using.

- Jamie


More information about the freebsd-jail mailing list