NFS + nullfs + jail = zombies?

Sat Jul 9 14:23:08 UTC 2016

On 2016-07-08 12:28, Thomas Johnson wrote:
> I am working on developing a clustered application utilizing jails and
> running into problems that seem to be NFS-related. I'm hoping that
> someone can point out my error.
> 
> The jail images and my application data are served via NFS. The host
> mounts NFS at boot, and then uses nullfs mounts to assemble the jail
> tree when the jail is created (fstab files and jail.conf are below).
> This seems to work fine, the jail starts and is usable. The problem
> comes when I remove/restart the jail. Frequently (but not
> consistently), the jail gets stuck in a dying state, causing the
> unmount of the jail root (nullfs) to fail with a "device busy" error.
> 
> # jail -f /var/local/jail.conf -r wds1-1a
> Stopping cron.
> Waiting for PIDS: 1361.
> .
> Terminated
> wds1-1a: removed
> umount: unmount of /var/jail/wds1-1a failed: Device busy
> # jls -av
>    JID  Hostname                      Path
>         Name                          State
>         CPUSetID
>         IP Address(es)
>      1  wds1-1a                       /var/jail/wds1-1a
>         wds1-1a                       DYING
>         2
>         2620:1:1:1:1a::1
> 
> Through trial-and-error I have determined that forcing an unmount of
> the root works, but subsequent mounts to that mount point will fail to
> unmount with the same error. Deleting and recreating the mountpoint
> fixes the mounting issue, but the dying jail remains permanently.
> 
> I have also found that if I copy the jail root to local storage and
> update the jail's fstab to nullfs mount this, the problem seems to go
> away. This leads me to believe that the issue is related to the NFS
> source for the nullfs mount. statd and lockd are both running on the
> host.
> 
> My relevant configurations are below. I can provide any other
> information desired.
> 
> # Host fstab line for jail root.
> #
> 10.219.212.1:/vol/dev/wds/jail_base  /jail/base nfs ro    0    0
> 
> 
> # Jail fstab file (mount.fstab)
> #
> /jail/base /var/jail/wds1-1a nullfs ro 0 0
> # writable (UFS-backed) /var
> /var/jail-vars/wds1-1a /var/jail/wds1-1a/var nullfs rw 0 0
> 
> 
> # jail.conf file
> #
> * {
>     devfs_ruleset = "4";
>     mount.devfs;
>     exec.start = "/bin/sh /etc/rc";
>     exec.stop = "/bin/sh /etc/rc.shutdown";
>     interface = "vmx1";
>     allow.dying = 1;
>     exec.prestart = "/usr/local/bin/rsync -avC --delete
> /jail/${image}/var/ /var/jail-vars/${host.hostname}/";
>     }
> 
> # JMANAGE wds1-1a
> wds1-1a {
>     path = "/var/jail/wds1-1a";
>     ip6.addr = "2620:1:1:1:1a::1";
>     host.hostname = "wds1-1a";
>     host.domainname = "dev";
>     mount.fstab = "/var/local/fstab.wds1-1a";
>     $image = "base";
> }

What happens if you take jails out of the equation?  I know this isn't 
entirely a non-jail issue, but I wonder if a jail is required for the 
mount point to be un-re-mountable.  I've heard before of NFS-related 
problems where a jail remains dying forever, but this has been more of 
an annoyance than a real problem.

It's not so much that I want to absolve jails, as I want to see where 
the main fight exists.  It's tricky enough fixing an interface between 
two systems, but we've got three here.

- Jamie