RFC: enhancing the root mount logic

Mon Aug 23 23:13:16 UTC 2010

In message: <AFBE2FCA-30A6-4E1D-A964-AC4DC4C843EB at juniper.net>
            Marcel Moolenaar <marcelm at juniper.net> writes:
: All,
: 
: In embedded products, software is possibly installed as an image onto
: an actual storage device. This means that mounting the storage device
: as root is not enough to have a usable root file system. The rough
: draft below is an idea to enhance the root mount from having ad-hoc
: quirks to a well-defined and recursive mechanism to allow a wide-
: range of use cases.
: 
: The root mount logic is recursive as follows:
: 1.  The kernel mounts devfs as root (is it is now).
: 2.  The kernel will re-mount root by virtue of reading a file, called
:     /.mount.conf, in the current root file system and following the
:     directives is it. devfs synthesizes the contents of this file.
: 
: At each iteration, the kernel will:
: 1.  move the devfs mount from /dev in the old file system to /dev in
:     the new file system.
: 2.  As per the directives or unconditionally, the kernel will re-mount
:     the old root file system under /.mount (or some other name) within
:     the new file system.
: 
: devfs will synthesize the contents of /.mount.conf as per the kernel
: configuration and tunables. The administrator (or install process)
: will create and populate /.mount.conf for all other cases.
: 
: Directives in /.mount.conf are envisioned to be something like:
: 
:    {FS}:{MOUNTPOINT}	e.g.	ufs:/dev/da0
: 	a root mount alternative. The order of the alternatives in
: 	the file determines the priority.
: 
:    .ask
: 	a root mount alternative that asks the operator to specify
: 	what the root mount should be.
: 
:    .wait N			.e.g.	.wait 5
: 	wait at most N seconds for a root mount alternative to
: 	succeed. If an alternative does not succeed within that
: 	time, move on to the next alternative.
: 
:    .onfail	{panic|reboot|retry|continue}
: 	Tells the kernel what to do in case it can't successfully
: 	complete the root mount as directed to.
: 
: The .wait directive works better (probably) if we have events that
: signify the arrival of a file system or device special file, so that
: we can wait for at most N seconds after the last event. This also
: allows us to wait for a separate interval between events.
: 
: As an example, consider:
: 
:    [devfs]	/.mount.conf:
: 	ufs:/dev/da0
: 	.ask
: 	.wait 5
: 	.onfail panic
: 
:    [ufs:/dev/da0]	/.mount.conf
: 	md0:/images/OS-image-1.0.iso
: 	unionfs:/jail/freebsd-8-stable
: 	.wait 0
: 	.onfail continue
: 
: In the example, the kernel will mount devfs, read /.mount.conf and
: wait at most 5 seconds to mount the UFS on /dev/da0. If that fails,
: the kernel will ask (once) and panic in case of failure.
: 
: If the UFS root mount succeeded, the kernel will re-mount devfs
: underneath /dev. Since this is the first non-devfs root file system,
: the kernel will not re-mount the old root under /.mount.
: 
: Since there's a /.mount.conf on the UFS, the kernel will read it
: and repeat the process. First it'll try and mount the OS image
: in /images/OS-image-1.0.iso and if it's not present will try to
: mount some -stable 8 chroot using unionfs (not necessarily a
: real-world example here :-) If either fails, the kernel will
: continue booting using the current root file system. Assuming that
: the image is present, the kernel will re-mount root, move devfs
: underneath /dev in the MD root and remount ufs:/dev/da0 under
: /.mount in the MD root. This gives the following picture:
: 
: /		md0:[ufs:/dev/da0]/images/OS-image-1.0.iso
: /.mount		ufs:/dev/da0
: /dev		devfs
: 
: 
: Things to not explicitly touched upon:
: o   root mount options
: o   directives to instruct the kernel what to run as the initial
:     process to eliminate the rather ad-hoc hardcoding. E.g:
: 	.init /sbin/init
: 	.init /sbin/init.old
: 
: Is this something that people feel is worth fleshing out and
: prototyping?

This sounds very interesting.  If kept simple, I could see how this
would make my life a lot easier.

However, all this scripting sounds a bit like a very simple shell in
the kernel.  What advantages are there to this approach vs having the
ability to run a simple shell script or executable and "pivot" the root
to a new location?  And how do you emulate the mount_foo programs for
foo filesystems?  Some of them do weird things that might not
translate well into the kernel...

As you can see, I'm torn about how I feel about the idea.  For simple
cases, I think it is great, but as complexity builds, I become less
sure.  What if that iso image was compressed?  What if I had a
software RAID of disks or flash devices?  What about crypto?  I know I
can handle those cases in /bin/sh, but will each new one require more
code in the kernel?  What would df and/or mount tell you about the
now-hidden file systems?

Warner