ZFS-only booting on FreeBSD
bonomi at mail.r-bonomi.com
Sat Feb 19 20:07:32 UTC 2011
> Date: Sat, 19 Feb 2011 10:35:35 -0500
> From: Daniel Staal <DStaal at usa.net>
> Subject: Re: ZFS-only booting on FreeBSD
[[.. sneck ..]]
> Basically, if a ZFS boot drive fails, you are likely to get the following
> 1) 'What do I need to do to replace a disk in the ZFS pool?'
> 2) 'Oh, that's easy.' Replaces disk.
> 3) System fails to boot at some later point.
> 4) 'Oh, right, you need to do this *as well* on the *boot* pool...'
> Where if a UFS boot drive fails on an otherwise ZFS system, you'll get:
> 1) 'What's this drive?'
> 2) 'Oh, so how do I set that up again?'
> 3) Set up replacement boot drive.
> The first situation hides that it's a special case, where the second one
"For any foolproof system, there exists a _sufficiently-determined_ fool
capable of breaking it" applies.
> To avoid the first scenario you need to make sure your sysadmins are
> following *local* (and probably out-of-band) docs, and aware of potential
> problems. And awake. ;) The scenario in the second situation presents
> it's problem as a unified package, and you can rely on normal levels of
> alertness to be able to handle it correctly. (The sysadmin will realize
> it needs to be set up as a boot device because it's the boot device. ;)
> It may be complicated, but it's *obviously* complicated.)
> I'm still not clear on whether a ZFS-only system will boot with a failed
> drive in the root ZFS pool. Once booted, of course a decent ZFS setup
> should be able to recover from the failed drive. But the question is if
> the FreeBSD boot process will handle the redundancy or not. At this
> point I'm actually guessing it will, which of course only exasperates the
> above surprise problem: 'The easy ZFS disk replacement procedure *did*
> work in the past, why did it cause a problem now?' (And conceivably it
> could cause *major* data problems at that point, as ZFS will *grow* a
> pool quite easily, but *shrinking* one is a problem.)
A non-ZFS boot drive results in immediate, _guaranteed_, down-time for
replacement if/when it fails.
A ZFS boot drive lets you replace the drive and *schedule* the down-time
(for a 'test' re-boot, to make *sure* everything works) at a convenient
Failure to schedule the required down time is a management failure, not
a methodology issue. One has located the requisite "sufficiently-
determined" fool, and the results thereof are to be expected.
More information about the freebsd-questions