On ZFS and 64/32 dual-booting.

Zaphod Beeblebrox zbeeble at gmail.com
Sat Mar 8 19:53:48 UTC 2008

Since there are still reasons to dual boot between i386 and amd64 on FreeBSD
(kernel modules like the nvidia driver only exist for i386, 4G memory only
usable in amd64), I set a simple goal for myself: find a good way to dual
boot with zfs.

Some traditional things (like sharing /usr/share and /usr/ports) hardly
matter anymore --- but they're easy.  I didn't share /usr/local/share as the
installed set of packages on each platform was not necessarily the same.
Besides, with dual 320 Gig drives in a laptop, does that kind of space even
really matter any more.

I briefly considered making a zfs root, but there are several reasons not:
1) root hardly changes anyways (not a lot of benifit), 2) backing up root is
easy (not much benift) and 3) we need to point to separate /usr and /var
partitions when booted in each mode.  So I have two root partitions ad8s1a
and ad8s2a.  I've labeled the filesystems so they show up as /dev/ufs/root32
and /dev/ufs/root64

For further reference my zfs pool is 'canoe' (workstations at one company
were classes of warships, a canoe is a portable warship --- and I kept the

=== Sorting out the Symlinks ===

I've always followed the tradition of mounting foreign filesystems as
/d/<machine>/<mount> such that /d/myself is a symlink to /.  Meaning that
symlinks work on other machines because you create a link to
/d/myself/usr/foo rather than /usr/foo.  Same here.  On the root32, create
/d/64 (where root64 mounts) and create /d/32 as a symlink to /.

Now... on root32, usr is /canoe/32/usr and var is /canoe/32/var

Similarly, on root64, usr is /canoe/64/usr and var is /canoe/64/var

=== Things the same only different ===

For /usr/ports and /usr/src, it seems find to set the zfs mountpoint to
/usr/ports and /usr/src.  They seem to mount there through all the symlinks
just fine.  In hindisght, I could have done that for /usr/share, but
currently, /canoe/32/usr/share links to /canoe/64/usr/share --- so you can
use either approach for shared data.  There are mild arguments for having at
least /usr/ports as a separate zfs filesystem --- not the least of which is
the keeping of snapshot backups (or not).

Interestingly, the zfs guide recomends creating /usr/ports and then
/usr/ports/distfiles as separate filesystems and then setting compression on
/usr/ports and not on /usr/ports/distfiles.  Besides the nasty pauses in
interactivity that zfs compression causes, compressing /usr/ports as a
filesystem is futile.  A stock copy of /usr/ports (with no ports built)
recieves a compression ratio of 1.06 from zfs using gzip-9 compression.  On
the other hand, /usr/src compresses about 3.5 times.  This again if you even
care about disk space in general

Another small thing you might think about sharing is /usr/sup.  If you use
cvsup, this little directory contains the data pertaining to what you have
checked out in your tree.  I'm not positive of the consequences of moving
back and forth between two installs sharing /usr/src and /usr/ports while
changing the /usr/sup in use --- but it seems like a good idea to share it.

On concession to all this is that fetchmail checks /d/32/var/mail and
/d/64/var/mail.  I suppose I could have create a zfs /var/mail as well, but
the simple workaround was in place before I moved var onto zfs.

=== On to Userland ===

Another standard from long in the dark days of my history is to put users on
/u rather tha /usr/home --- less typing, etc.  This used to lead to /u1,
/u2, ...

On our dual-boot machine I create /u and /u/user.  I'm a little nervous
about software keeping non-architecture clean data in your home directory.
I use spamproble (multi-word baysian filter) which stores a berkley DB in
~/.spamprobe.  Obviously you _want_ to share this data.  Currently the
problem is a moot point as spamprobe didn't build in 64 bit and I ended up
copying the 32 bit binary to the 64 bit side --- so I don't yet know if this
DB is safe.  Someone told me that it would compile 64 bit now --- but I
havn't tried.  Inertia.

The large aps in my day --- emacs, firefox, thunderbird, xchat --- all seem
to keep their data architecture independantly.  At least, I havn't had a
problem with an app that I can remember.

I have also found it handy to create a few sub-user zfs filesystems --- for
.wine and "emu" so far.  .wine holds the obvious windoze things and "emu"
holds disk images for qemu emulations that I run to test some new kernel
modules I'm working on --- both of these would generate significant churn on
my snapshot size.  I'm also considering a sub-filesystem for my mp3's ---
but I'm a little undecided as to how to effectively share them with the copy
that is on the fileserver for when the laptop is not in either BSD mode.

=== What doesn't work OOTB ===

The startup scripts for ZFS are still a little green.  One issue is that the
startup script 'requires' mountcritlocal --- I assume because it figures it
requires it so that it's own filesystems will mount on top of other local
UFS ones.  At least in my case, this is backwards.  I need zfs to run BEFORE
mountcritlocal and BEFORE mdconfig.  I have changed my require line to 'root
hostid' ... since it's good to have the hostid already set and having root
r/w is also good.  I don't think I've solved the "BEFORE" problem, but the
my requirements might make it into the CVS tree.

This dependancy issue is an interesting one.  I assume that the fstab code
make sure that filesystems are mounted in a sane order ... or maybe it's
just the order in the file itself --- I've never had a problem, so I don't
know.  However, having this information in two places poses the immediate
problem... one person might have a ufs /usr and a zfs /usr/ports and another
might have a zfs /usr and a ufs or nfs /usr/home.  Calling zfs mount -a
either before or after mountcritlocal isn't going to make everyone happy.
Maybe it needs to be called both times?  I dunno.  I dunno if zfs can fail
gracefully when things it needs arn't mounted yet.

Now... the "hostid" for my machine is the same on both 32 and 64 bit.  This
might not be the default if your machine ran uuidgen, but my laptop does
have a uuid in it's bios env --- so I got off easy there.  However, I've
found I still needed to add the "zpool import -f canoe" to the start
function because somthing about the zfs cache file (or something) isn't
entirely happy about the dual-boot process.  I could have tried sharing the
zpool cache --- but I didn't have any idea what the consequences would be
and the zpool import -f worked.  It might be an idea to add an
rc.confvariable along the lines of "zfs_zpool_force="canoe"" (which
ends up calling
"zpool import -f canoe"

=== YMMV ===

What I havn't detailed here is how you bootstrap this all.  In my case, I
used partition magic to shrink and move around the windoze partition (the
laptop is in fact qudriple booted xp/vista/FreeBSD-32/FreeBSD-64).  There
also obviously isn't a good solution for zfs in XP/Vista.  Sharing files
with those OSs requires that I use fuse and it's ntfs and/or SMB mounted
filesystems from other computers on the network.  This is imperfect at
best.  It's mitigated by the fact that I generally only play games there ---
it would be much worse if I were trying to get work done.

Anyways... the bootstrap of the FreeBSD world is much like the bootstrap for
root-on-gmirror (see the handbook).  In my case, I had two regular FreeBSD
installs with a zfs for /u for awhile and then I backed up the /usr, /var
(both 32 and 64) and the zfs pool (using zfs send) and then I made a much
larger zfs pool and repopulated it.

Another method to approach this would be to run an regular minimal install
onto a 1G root (it fits).  I also have my swap outside of zfs --- so in my
case, I have 4 fdisk partitions on each disk

disk 1: 80M, 99G, <blank>, 205G (dell diagnostics, Windows XP, No partition,
ZFS + swap)
disk 2: 9G, 1G, 99G, 197G (root64 + swap, root32, Vista, ZFS)

The swap partitions (ad4s4b and ad8s1b) are shared by both FreeBSD systems.
I used glabel to make this easy --- calling them /dev/label/swap[12]

If you run a minimal install to each root, then move the usr and var onto
zfs, you can then run a "make world" to fill out all the missing files.

=== In praise of ZFS ===

So why do all of this?  My shortlist:

   1. Regardless of the filesystem involved, I like to use at least
   RAID1.  Disks are cheap and disks fail.
   2. The ability to hand-off snapshots to another running system is very
   cool.  It allows me to browse a filesystem that's (mostly) up-to-date when
   my laptop is not online
   3. Conceptually, I like having many filesystems and filesystem
   divisions.  /, /usr, /var, /u (a minimal set), but I dislike having to waste
   space in one filesystem when I might need it in another.

=== Forward Thinking ===

   1. While my laptop may be one of the few to have two drives, it would
   be cool to have a ZFS plugin that would shutdown both drives.  Then, when it
   came time to write a blob, wake only one drive, write the blob, and stop the
   drive again.  Then, when it comes to the next point, wake the other drive,
   write the first blob, the new second blob, and then shutdown.  And so
   forth.  Similarly, it might be an idea to preferentially trigger a flush of
   the blob at the end of a read --- since one or more of the drives would have
   spun up for that.
   2. Dependencies of /etc/rc.d/zfs need rethinking
   3. Potentially, you could now have Solaris, OpenSolaris and FreeBSD
   32/64 --- 4 OSs that support ZFS on the same computer.  While there may be
   reasons to boot multiple OSs on the system, it's also possible through
   installing the same packages and mounting a common (zfs) filesystem that the
   services on that computer remain the same.  ZFS mounting on multiple OSs on
   the same computer needs thought.
   4. (non-zfs related) It seems to me that most of the 64 bit systems
   have depended heavily on 32 bit binaries for many things.  "ls" doesn't need
   a 64 bit address space, for instance.  We havn't really looked at this much
   in FreeBSD, but it could cut the size of a 64 bit system down a peg if it
   commonly ran 32 bit binaries (rather than that beeing the exception)

=== Data ===

Just in case you need to visualize, here's some slightly sanitized output
from my system:

[2:2:302]sam at canoe:~> zfs list
NAME                                   USED  AVAIL  REFER  MOUNTPOINT
canoe                                 45.0G   144G    21K  /canoe
canoe/32                              4.52G   144G    21K  /canoe/32
canoe/32 at 20080307-1541                  16K      -    21K  -
canoe/32/usr                          4.46G   144G  4.43G  /canoe/32/usr
canoe/32/usr at 20080307-1541            30.6M      -  4.45G  -
canoe/32/usr/obj                        18K   144G    18K  /canoe/32/usr/obj
canoe/32/var                          63.7M   144G  63.2M  /canoe/32/var
canoe/32/var at 20080307-1644             557K      -  63.2M  -
canoe/64                              6.30G   144G    21K  /canoe/64
canoe/64/usr                          4.96G   144G  4.69G  /canoe/64/usr
canoe/64/usr at 20080307-1541             268M      -  4.76G  -
canoe/64/usr/obj                        18K   144G    18K  /canoe/64/usr/obj
canoe/64/var                          1.34G   144G  95.8M  /canoe/64/var
canoe/64/var at 20080307-1541            1.25G      -  1.33G  -
canoe/ports                           2.31G   144G  2.29G  /usr/ports
canoe/ports at 20080307-1541             18.9M      -  2.29G  -
canoe/ports/distfiles                   18K   144G    18K
canoe/src                             2.16G   144G  2.15G  /usr/src
canoe/src at 20080307-1541               8.93M      -  2.15G  -
canoe/sup                             28.7M   144G  28.7M  /usr/sup
canoe/u                               29.6G   144G    19K  /u
canoe/u/sam                           29.6G   144G  26.1G  /u/sam
canoe/u/sam at 20080307-1643             17.0M      -  26.1G  -
canoe/u/sam/.wine                     2.95G   144G  2.95G  /u/sam/.wine
canoe/u/sam/.wine at 20080307-1541         32K      -  2.95G  -
canoe/u/sam/emu                        593M   144G   593M  /u/sam/emu
canoe/u/sam/emu at 20080307-1541             0      -   593M  -
[2:3:303]sam at canoe:~> df -h
Filesystem                Size    Used   Avail Capacity  Mounted on
/dev/ufs/root32           993M    226M    687M    25%    /
devfs                     1.0K    1.0K      0B   100%    /dev
canoe                     144G      0B    144G     0%    /canoe
canoe/32                  144G      0B    144G     0%    /canoe/32
canoe/32/usr              148G    4.4G    144G     3%    /canoe/32/usr
canoe/32/usr/obj          144G      0B    144G     0%    /canoe/32/usr/obj
canoe/32/var              144G     63M    144G     0%    /canoe/32/var
canoe/64                  144G      0B    144G     0%    /canoe/64
canoe/64/usr              149G    4.7G    144G     3%    /canoe/64/usr
canoe/64/usr/obj          144G      0B    144G     0%    /canoe/64/usr/obj
canoe/64/var              144G     96M    144G     0%    /canoe/64/var
canoe/u                   144G      0B    144G     0%    /u
canoe/u/sam               170G     26G    144G    15%    /u/sam
canoe/u/sam/.wine         147G    2.9G    144G     2%    /u/sam/.wine
canoe/u/sam/emu           145G    593M    144G     0%    /u/sam/emu
canoe/ports               146G    2.3G    144G     2%    /usr/ports
canoe/ports/distfiles     144G      0B    144G     0%
canoe/src                 146G    2.2G    144G     1%    /usr/src
canoe/sup                 144G     29M    144G     0%    /usr/sup
/dev/ufs/root64           989M    347M    563M    38%    /d/64
/dev/md0                  1.9G     28K    1.8G     0%    /tmp
[2:4:304]sam at canoe:~> zpool status
  pool: canoe
 state: ONLINE
 scrub: none requested

        NAME        STATE     READ WRITE CKSUM
        canoe       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad4s4d  ONLINE       0     0     0
            ad8s4d  ONLINE       0     0     0

errors: No known data errors
[2:5:305]sam at canoe:~> zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
canoe                   192G   45.0G    147G    23%  ONLINE     -
[2:6:306]sam at canoe:~> pstat -s
Device          1K-blocks     Used    Avail Capacity
/dev/label/swap1   7994896        0  7994896     0%
/dev/label/swap2   8388604        0  8388604     0%
Total            16383500        0 16383500     0%

More information about the freebsd-hackers mailing list