NFS + nullfs + jail = zombies?

Thomas Johnson tommyj27 at gmail.com
Sun Jul 10 15:28:45 UTC 2016


If NFS and jails are known to misbehave, I can look at alternatives.
That said, if there is interest in troubleshooting this further, I am
set-up to play guinea pig.

To test this scenario without jails, I set jail_enable="NO" and moved
the nullfs lines from my "jail" fstab into /etc/fstab. After a
half-dozen cycles of "reboot, generate activity on the mount, umount",
I have been unable to reproduce the behavior. FWIW, I did have to set
the nullfs mounts "late" in this configuration to get NFS mounted
before creating the nullfs.

On Sat, Jul 9, 2016 at 9:22 AM, James Gritton <jamie at gritton.org> wrote:
> On 2016-07-08 12:28, Thomas Johnson wrote:
>>
>> I am working on developing a clustered application utilizing jails and
>> running into problems that seem to be NFS-related. I'm hoping that
>> someone can point out my error.
>>
>> The jail images and my application data are served via NFS. The host
>> mounts NFS at boot, and then uses nullfs mounts to assemble the jail
>> tree when the jail is created (fstab files and jail.conf are below).
>> This seems to work fine, the jail starts and is usable. The problem
>> comes when I remove/restart the jail. Frequently (but not
>> consistently), the jail gets stuck in a dying state, causing the
>> unmount of the jail root (nullfs) to fail with a "device busy" error.
>>
>> # jail -f /var/local/jail.conf -r wds1-1a
>> Stopping cron.
>> Waiting for PIDS: 1361.
>> .
>> Terminated
>> wds1-1a: removed
>> umount: unmount of /var/jail/wds1-1a failed: Device busy
>> # jls -av
>>    JID  Hostname                      Path
>>         Name                          State
>>         CPUSetID
>>         IP Address(es)
>>      1  wds1-1a                       /var/jail/wds1-1a
>>         wds1-1a                       DYING
>>         2
>>         2620:1:1:1:1a::1
>>
>> Through trial-and-error I have determined that forcing an unmount of
>> the root works, but subsequent mounts to that mount point will fail to
>> unmount with the same error. Deleting and recreating the mountpoint
>> fixes the mounting issue, but the dying jail remains permanently.
>>
>> I have also found that if I copy the jail root to local storage and
>> update the jail's fstab to nullfs mount this, the problem seems to go
>> away. This leads me to believe that the issue is related to the NFS
>> source for the nullfs mount. statd and lockd are both running on the
>> host.
>>
>> My relevant configurations are below. I can provide any other
>> information desired.
>>
>> # Host fstab line for jail root.
>> #
>> 10.219.212.1:/vol/dev/wds/jail_base  /jail/base nfs ro    0    0
>>
>>
>> # Jail fstab file (mount.fstab)
>> #
>> /jail/base /var/jail/wds1-1a nullfs ro 0 0
>> # writable (UFS-backed) /var
>> /var/jail-vars/wds1-1a /var/jail/wds1-1a/var nullfs rw 0 0
>>
>>
>> # jail.conf file
>> #
>> * {
>>     devfs_ruleset = "4";
>>     mount.devfs;
>>     exec.start = "/bin/sh /etc/rc";
>>     exec.stop = "/bin/sh /etc/rc.shutdown";
>>     interface = "vmx1";
>>     allow.dying = 1;
>>     exec.prestart = "/usr/local/bin/rsync -avC --delete
>> /jail/${image}/var/ /var/jail-vars/${host.hostname}/";
>>     }
>>
>> # JMANAGE wds1-1a
>> wds1-1a {
>>     path = "/var/jail/wds1-1a";
>>     ip6.addr = "2620:1:1:1:1a::1";
>>     host.hostname = "wds1-1a";
>>     host.domainname = "dev";
>>     mount.fstab = "/var/local/fstab.wds1-1a";
>>     $image = "base";
>> }
>
>
> What happens if you take jails out of the equation?  I know this isn't
> entirely a non-jail issue, but I wonder if a jail is required for the mount
> point to be un-re-mountable.  I've heard before of NFS-related problems
> where a jail remains dying forever, but this has been more of an annoyance
> than a real problem.
>
> It's not so much that I want to absolve jails, as I want to see where the
> main fight exists.  It's tricky enough fixing an interface between two
> systems, but we've got three here.
>
> - Jamie


More information about the freebsd-jail mailing list