On ZFS and 64/32 dual-booting.
Zaphod Beeblebrox
zbeeble at gmail.com
Sat Mar 8 19:53:48 UTC 2008
Since there are still reasons to dual boot between i386 and amd64 on FreeBSD
(kernel modules like the nvidia driver only exist for i386, 4G memory only
usable in amd64), I set a simple goal for myself: find a good way to dual
boot with zfs.
Some traditional things (like sharing /usr/share and /usr/ports) hardly
matter anymore --- but they're easy. I didn't share /usr/local/share as the
installed set of packages on each platform was not necessarily the same.
Besides, with dual 320 Gig drives in a laptop, does that kind of space even
really matter any more.
I briefly considered making a zfs root, but there are several reasons not:
1) root hardly changes anyways (not a lot of benifit), 2) backing up root is
easy (not much benift) and 3) we need to point to separate /usr and /var
partitions when booted in each mode. So I have two root partitions ad8s1a
and ad8s2a. I've labeled the filesystems so they show up as /dev/ufs/root32
and /dev/ufs/root64
For further reference my zfs pool is 'canoe' (workstations at one company
were classes of warships, a canoe is a portable warship --- and I kept the
name)
=== Sorting out the Symlinks ===
I've always followed the tradition of mounting foreign filesystems as
/d/<machine>/<mount> such that /d/myself is a symlink to /. Meaning that
symlinks work on other machines because you create a link to
/d/myself/usr/foo rather than /usr/foo. Same here. On the root32, create
/d/64 (where root64 mounts) and create /d/32 as a symlink to /.
Now... on root32, usr is /canoe/32/usr and var is /canoe/32/var
Similarly, on root64, usr is /canoe/64/usr and var is /canoe/64/var
=== Things the same only different ===
For /usr/ports and /usr/src, it seems find to set the zfs mountpoint to
/usr/ports and /usr/src. They seem to mount there through all the symlinks
just fine. In hindisght, I could have done that for /usr/share, but
currently, /canoe/32/usr/share links to /canoe/64/usr/share --- so you can
use either approach for shared data. There are mild arguments for having at
least /usr/ports as a separate zfs filesystem --- not the least of which is
the keeping of snapshot backups (or not).
Interestingly, the zfs guide recomends creating /usr/ports and then
/usr/ports/distfiles as separate filesystems and then setting compression on
/usr/ports and not on /usr/ports/distfiles. Besides the nasty pauses in
interactivity that zfs compression causes, compressing /usr/ports as a
filesystem is futile. A stock copy of /usr/ports (with no ports built)
recieves a compression ratio of 1.06 from zfs using gzip-9 compression. On
the other hand, /usr/src compresses about 3.5 times. This again if you even
care about disk space in general
Another small thing you might think about sharing is /usr/sup. If you use
cvsup, this little directory contains the data pertaining to what you have
checked out in your tree. I'm not positive of the consequences of moving
back and forth between two installs sharing /usr/src and /usr/ports while
changing the /usr/sup in use --- but it seems like a good idea to share it.
On concession to all this is that fetchmail checks /d/32/var/mail and
/d/64/var/mail. I suppose I could have create a zfs /var/mail as well, but
the simple workaround was in place before I moved var onto zfs.
=== On to Userland ===
Another standard from long in the dark days of my history is to put users on
/u rather tha /usr/home --- less typing, etc. This used to lead to /u1,
/u2, ...
On our dual-boot machine I create /u and /u/user. I'm a little nervous
about software keeping non-architecture clean data in your home directory.
I use spamproble (multi-word baysian filter) which stores a berkley DB in
~/.spamprobe. Obviously you _want_ to share this data. Currently the
problem is a moot point as spamprobe didn't build in 64 bit and I ended up
copying the 32 bit binary to the 64 bit side --- so I don't yet know if this
DB is safe. Someone told me that it would compile 64 bit now --- but I
havn't tried. Inertia.
The large aps in my day --- emacs, firefox, thunderbird, xchat --- all seem
to keep their data architecture independantly. At least, I havn't had a
problem with an app that I can remember.
I have also found it handy to create a few sub-user zfs filesystems --- for
.wine and "emu" so far. .wine holds the obvious windoze things and "emu"
holds disk images for qemu emulations that I run to test some new kernel
modules I'm working on --- both of these would generate significant churn on
my snapshot size. I'm also considering a sub-filesystem for my mp3's ---
but I'm a little undecided as to how to effectively share them with the copy
that is on the fileserver for when the laptop is not in either BSD mode.
=== What doesn't work OOTB ===
The startup scripts for ZFS are still a little green. One issue is that the
startup script 'requires' mountcritlocal --- I assume because it figures it
requires it so that it's own filesystems will mount on top of other local
UFS ones. At least in my case, this is backwards. I need zfs to run BEFORE
mountcritlocal and BEFORE mdconfig. I have changed my require line to 'root
hostid' ... since it's good to have the hostid already set and having root
r/w is also good. I don't think I've solved the "BEFORE" problem, but the
my requirements might make it into the CVS tree.
This dependancy issue is an interesting one. I assume that the fstab code
make sure that filesystems are mounted in a sane order ... or maybe it's
just the order in the file itself --- I've never had a problem, so I don't
know. However, having this information in two places poses the immediate
problem... one person might have a ufs /usr and a zfs /usr/ports and another
might have a zfs /usr and a ufs or nfs /usr/home. Calling zfs mount -a
either before or after mountcritlocal isn't going to make everyone happy.
Maybe it needs to be called both times? I dunno. I dunno if zfs can fail
gracefully when things it needs arn't mounted yet.
Now... the "hostid" for my machine is the same on both 32 and 64 bit. This
might not be the default if your machine ran uuidgen, but my laptop does
have a uuid in it's bios env --- so I got off easy there. However, I've
found I still needed to add the "zpool import -f canoe" to the start
function because somthing about the zfs cache file (or something) isn't
entirely happy about the dual-boot process. I could have tried sharing the
zpool cache --- but I didn't have any idea what the consequences would be
and the zpool import -f worked. It might be an idea to add an
rc.confvariable along the lines of "zfs_zpool_force="canoe"" (which
ends up calling
"zpool import -f canoe"
=== YMMV ===
What I havn't detailed here is how you bootstrap this all. In my case, I
used partition magic to shrink and move around the windoze partition (the
laptop is in fact qudriple booted xp/vista/FreeBSD-32/FreeBSD-64). There
also obviously isn't a good solution for zfs in XP/Vista. Sharing files
with those OSs requires that I use fuse and it's ntfs and/or SMB mounted
filesystems from other computers on the network. This is imperfect at
best. It's mitigated by the fact that I generally only play games there ---
it would be much worse if I were trying to get work done.
Anyways... the bootstrap of the FreeBSD world is much like the bootstrap for
root-on-gmirror (see the handbook). In my case, I had two regular FreeBSD
installs with a zfs for /u for awhile and then I backed up the /usr, /var
(both 32 and 64) and the zfs pool (using zfs send) and then I made a much
larger zfs pool and repopulated it.
Another method to approach this would be to run an regular minimal install
onto a 1G root (it fits). I also have my swap outside of zfs --- so in my
case, I have 4 fdisk partitions on each disk
disk 1: 80M, 99G, <blank>, 205G (dell diagnostics, Windows XP, No partition,
ZFS + swap)
disk 2: 9G, 1G, 99G, 197G (root64 + swap, root32, Vista, ZFS)
The swap partitions (ad4s4b and ad8s1b) are shared by both FreeBSD systems.
I used glabel to make this easy --- calling them /dev/label/swap[12]
If you run a minimal install to each root, then move the usr and var onto
zfs, you can then run a "make world" to fill out all the missing files.
=== In praise of ZFS ===
So why do all of this? My shortlist:
1. Regardless of the filesystem involved, I like to use at least
RAID1. Disks are cheap and disks fail.
2. The ability to hand-off snapshots to another running system is very
cool. It allows me to browse a filesystem that's (mostly) up-to-date when
my laptop is not online
3. Conceptually, I like having many filesystems and filesystem
divisions. /, /usr, /var, /u (a minimal set), but I dislike having to waste
space in one filesystem when I might need it in another.
=== Forward Thinking ===
1. While my laptop may be one of the few to have two drives, it would
be cool to have a ZFS plugin that would shutdown both drives. Then, when it
came time to write a blob, wake only one drive, write the blob, and stop the
drive again. Then, when it comes to the next point, wake the other drive,
write the first blob, the new second blob, and then shutdown. And so
forth. Similarly, it might be an idea to preferentially trigger a flush of
the blob at the end of a read --- since one or more of the drives would have
spun up for that.
2. Dependencies of /etc/rc.d/zfs need rethinking
3. Potentially, you could now have Solaris, OpenSolaris and FreeBSD
32/64 --- 4 OSs that support ZFS on the same computer. While there may be
reasons to boot multiple OSs on the system, it's also possible through
installing the same packages and mounting a common (zfs) filesystem that the
services on that computer remain the same. ZFS mounting on multiple OSs on
the same computer needs thought.
4. (non-zfs related) It seems to me that most of the 64 bit systems
have depended heavily on 32 bit binaries for many things. "ls" doesn't need
a 64 bit address space, for instance. We havn't really looked at this much
in FreeBSD, but it could cut the size of a 64 bit system down a peg if it
commonly ran 32 bit binaries (rather than that beeing the exception)
=== Data ===
Just in case you need to visualize, here's some slightly sanitized output
from my system:
[2:2:302]sam at canoe:~> zfs list
NAME USED AVAIL REFER MOUNTPOINT
canoe 45.0G 144G 21K /canoe
canoe/32 4.52G 144G 21K /canoe/32
canoe/32 at 20080307-1541 16K - 21K -
canoe/32/usr 4.46G 144G 4.43G /canoe/32/usr
canoe/32/usr at 20080307-1541 30.6M - 4.45G -
canoe/32/usr/obj 18K 144G 18K /canoe/32/usr/obj
canoe/32/var 63.7M 144G 63.2M /canoe/32/var
canoe/32/var at 20080307-1644 557K - 63.2M -
canoe/64 6.30G 144G 21K /canoe/64
canoe/64/usr 4.96G 144G 4.69G /canoe/64/usr
canoe/64/usr at 20080307-1541 268M - 4.76G -
canoe/64/usr/obj 18K 144G 18K /canoe/64/usr/obj
canoe/64/var 1.34G 144G 95.8M /canoe/64/var
canoe/64/var at 20080307-1541 1.25G - 1.33G -
canoe/ports 2.31G 144G 2.29G /usr/ports
canoe/ports at 20080307-1541 18.9M - 2.29G -
canoe/ports/distfiles 18K 144G 18K
/usr/ports/distfiles
canoe/src 2.16G 144G 2.15G /usr/src
canoe/src at 20080307-1541 8.93M - 2.15G -
canoe/sup 28.7M 144G 28.7M /usr/sup
canoe/u 29.6G 144G 19K /u
canoe/u/sam 29.6G 144G 26.1G /u/sam
canoe/u/sam at 20080307-1643 17.0M - 26.1G -
canoe/u/sam/.wine 2.95G 144G 2.95G /u/sam/.wine
canoe/u/sam/.wine at 20080307-1541 32K - 2.95G -
canoe/u/sam/emu 593M 144G 593M /u/sam/emu
canoe/u/sam/emu at 20080307-1541 0 - 593M -
[2:3:303]sam at canoe:~> df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/ufs/root32 993M 226M 687M 25% /
devfs 1.0K 1.0K 0B 100% /dev
canoe 144G 0B 144G 0% /canoe
canoe/32 144G 0B 144G 0% /canoe/32
canoe/32/usr 148G 4.4G 144G 3% /canoe/32/usr
canoe/32/usr/obj 144G 0B 144G 0% /canoe/32/usr/obj
canoe/32/var 144G 63M 144G 0% /canoe/32/var
canoe/64 144G 0B 144G 0% /canoe/64
canoe/64/usr 149G 4.7G 144G 3% /canoe/64/usr
canoe/64/usr/obj 144G 0B 144G 0% /canoe/64/usr/obj
canoe/64/var 144G 96M 144G 0% /canoe/64/var
canoe/u 144G 0B 144G 0% /u
canoe/u/sam 170G 26G 144G 15% /u/sam
canoe/u/sam/.wine 147G 2.9G 144G 2% /u/sam/.wine
canoe/u/sam/emu 145G 593M 144G 0% /u/sam/emu
canoe/ports 146G 2.3G 144G 2% /usr/ports
canoe/ports/distfiles 144G 0B 144G 0%
/usr/ports/distfiles
canoe/src 146G 2.2G 144G 1% /usr/src
canoe/sup 144G 29M 144G 0% /usr/sup
/dev/ufs/root64 989M 347M 563M 38% /d/64
/dev/md0 1.9G 28K 1.8G 0% /tmp
[2:4:304]sam at canoe:~> zpool status
pool: canoe
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
canoe ONLINE 0 0 0
mirror ONLINE 0 0 0
ad4s4d ONLINE 0 0 0
ad8s4d ONLINE 0 0 0
errors: No known data errors
[2:5:305]sam at canoe:~> zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
canoe 192G 45.0G 147G 23% ONLINE -
[2:6:306]sam at canoe:~> pstat -s
Device 1K-blocks Used Avail Capacity
/dev/label/swap1 7994896 0 7994896 0%
/dev/label/swap2 8388604 0 8388604 0%
Total 16383500 0 16383500 0%
More information about the freebsd-hackers
mailing list