[Bug 209112] /usr/sbin/jail jails fail to launch with possible race when jails mount common dir with nullfs

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Wed Apr 27 20:51:51 UTC 2016


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209112

            Bug ID: 209112
           Summary: /usr/sbin/jail jails fail to launch with possible race
                    when jails mount common dir with nullfs
           Product: Base System
           Version: 10.3-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: bin
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: agifford at infowest.com

On a host with multiple jails (configured using /etc/jail.conf) that mount a
common directory as read-only with the jail, some jails will randomly, silently
fail to launch due to nullfs mounting failing (silently).  This also occurs on
FreeBSD 10.2 and possibly earlier.

DETAILS:

I've got a FreeBSD 10.3 host with three jails that use nullfs to mount a common
read-only base system. On reboot, only one or two of the three will start, and
I cannot predict which ones. The first jail listed in /etc/jail.conf will
usually launch just fine. But subsequent ones fail (and I cannot predict which
ones will succedd or fail). There are NO logs indicating the reason for failure
on the main system, nor in the jails' individual console log files.

To track down the problem, I added some debugging logging into the
/etc/rc.d/jail script, and some exec.prestart/exec.poststart lines to my
jail.conf configuration:

/etc/jail.conf:

jail1 {
  host.hostname  = "jail1.example.org";
  path  = "/usr/local/jail/jail1";
  ip4.addr  = 127.0.0.11;
  mount  = "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail1/basejail
nullfs ro 0 0";
  exec.consolelog = "/var/log/jail_${host.hostname}.log";
  exec.prestart = "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEBUG'";
  exec.poststart = "/bin/sh -c 'echo POSTSTART_${host.hostname} >>
/tmp/DEBUG'";
}

jail2 {
  host.hostname  = "jail2.example.org";
  path  = "/usr/local/jail/jail2";
  ip4.addr  = 127.0.0.12;
  mount  = "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail2/basejail
nullfs ro 0 0";
  exec.consolelog = "/var/log/jail_${host.hostname}.log";
  exec.prestart = "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEBUG'";
  exec.poststart = "/bin/sh -c 'echo POSTSTART_${host.hostname} >>
/tmp/DEBUG'";
}

jail3 {
  host.hostname  = "jail3.example.org";
  path  = "/usr/local/jail/jail3";
  ip4.addr  = 127.0.0.11;
  mount  = "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail3/basejail
nullfs ro 0 0";
  exec.consolelog = "/var/log/jail_${host.hostname}.log";
  exec.prestart = "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEBUG'";
  exec.poststart = "/bin/sh -c 'echo POSTSTART_${host.hostname} >>
/tmp/DEBUG'";
}


To the /etc/rc.d/jail script, in the jail_start() function, in the _ALL case
statement subsection, to capture the output stored in the $_tmp file on
error/failure I added:

echo  "DEBUG: Contents of '$_tmp' are:" >> /tmp/DEBUG
cat $_tmp >> /tmp/DEBUG
echo  "DEBUG: END OF '$_tmp' CONTENTS" >> /tmp/DEBUG

I reboot the FreeBSD 10.3 system.  Only a SINGLE jail started, the first one.
The output of /tmp/DEBUG showed me:

PRESTART_jail1
POSTSTART_jail1
DEBUG: Contents of '/tmp/jail.hyLntGie' are:
mount_nullfs: /usr/local/jail/jail2/basejail: Operation not supported by device
mount_nullfs: /usr/local/jail/jail3/basejail: Operation not supported by device
jail: jail2: /sbin/mount -t nullfs -o ro /usr/local/jail/basejail_2016_04_19
/usr/local/jail/jail2/basejail: failed
jail: jail3: /sbin/mount -t nullfs -o ro /usr/local/jail/basejail_2016_04_19
/usr/local/jail/jail3/basejail: failed
jail1: created
DEBUG: jail_start(): END OF '/tmp/jail.hyLntGie' CONTENTS

Ah ha!  The nullfs mounting failed!

BUG #1: Apparently the /usr/sbin/jail command must attempt to launch jails in
parallel and there may be some file system resource that the parallel mounting
of the common directory is encountering.

And unfortunately the failure was SILENT!  No logs!


BUG #2: The /etc/rc.d/jail script is NOT LOGGING the failure information!


WORKAROUND FOR THE INTERIM:

I can force /usr/sbin/jail to launch my jails sequentially by adding to each
jail's /etc/jail.conf section a "depend =" line, like:

jail2 {
  ...
  depend = jail1;
  ...
}

jail3 {
  ...
  depend = jail2;
  ...
}


This strikes me as a very brittle work-around.  And if one jail fails to launch
for some other reason, all subsequent jails would fail.

The best solution would be to eliminate whatever resource contention is going
on here.

Google searches revealed a jail_parallel_start=NO rc.conf variable, but those
appeared to be related to the /etc/rc.d/jail script doing things in parallel. 
In this bug, it is the /usr/sbin/jail command executing as a single process
that is likely doing things in parallel (or perhaps sequentially but quickly
enough that there is some resource contention in the nullfs mounting still)
unless the depend= settings are included.

Thanks for your help!

Aaron out.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list