Jail stuck in dying
James Gritton
jamie at freebsd.org
Thu Mar 9 15:19:05 UTC 2017
On 2017-03-08 19:01, Kristof Provost wrote:
> Hi,
>
> On a current box (r314933) I can’t seem to stop my jails.
>
> It’s started like this:
>
> jail -c name=test0 host.hostname=test vnet persist
> vnet.interface=epair0b \
> path=/usr/jails/jail1 exec.start="/bin/sh /etc/rc"
>
> I terminate it with `jail -R test0`, yet the jail stays stuck in dying
> state:
>
> $ jls -a
> JID IP Address Hostname Path
> 1 test /usr/jails/jail1
>
> $ jls -na
> devfs_ruleset=0 dying enforce_statfs=2 host=new ip4=inherit
> ip6=inherit jid=1 name=test0 osreldate=1200023 osrelease=12.0-CURRENT
> parent=0 path=/usr/jails/jail1 nopersist securelevel=-1
> sysvmsg=disable sysvsem=disable sysvshm=disable vnet=new
> allow.nochflags allow.nomount allow.mount.nodevfs
> allow.mount.nofdescfs allow.mount.nolinprocfs allow.mount.nolinsysfs
> allow.mount.nonullfs allow.mount.noprocfs allow.mount.notmpfs
> allow.mount.nozfs allow.noquotas allow.noraw_sockets
> allow.set_hostname allow.nosocket_af allow.nosysvipc children.cur=0
> children.max=0 cpuset.id=2 host.domainname="" host.hostid=0
> host.hostname=test host.hostuuid=00000000-0000-0000-0000-000000000000
> ip4.addr= ip4.saddrsel ip6.addr= ip6.saddrsel
>
> I’ve tried debugging this, but the most I can say is that there
> appears to be something wrong with the reference counting.
> prison_deref() returns because pr->pr_ref is 3. There are no more
> jailed processes running, so I have no idea why this happens.
>
> The problem doesn’t appear to be related to vnet. I can reproduce it
> without setting the vnet flag when creating the jail as well.
It's never about processes - dying jails are those that have no
processes but still have other references to them in the kernel. Those
"other references" are almost always through the cred system: crcopy()
calls prison_hold() to increase the associated prison's pf_ref, and
crfree() calls prison_free().
The hard part is tracking down just what might be holding such a
credential; there is nothing that points from a prison to its associated
creds. But it's almost always associated with the network stack. In
normal operation, a jail may stick around for a little while in the
dying state until its just-closed TCP connections time out. I've heard
report of NFS mounts associated with jails sometimes causing such
references that don't go away. There are of course many other places
that use creds, but most of them are in some way associated with
processes.
As long as the problem only manifests as jails in the dying list, the
easiest solution is to ignore it. As long as you don't have hard-coded
JIDs in jail.conf, you can re-create the jail and it will get a
different JID. The need for a particular JID isn't what it used to be,
and chances are you can get away with dynamically numbered jails. If
you want to track down what's holding the jail half-alive, it becomes a
matter of find out exactly what (probably network) resources a jail is
using.
- Jamie
More information about the freebsd-jail
mailing list