ZFS 'read-only' device / pool scan / import?
Karl Pielorz
kpielorz_lst at tdx.co.uk
Tue Oct 19 15:30:43 UTC 2010
--On 19 October 2010 08:16 -0700 Jeremy Chadwick <freebsd at jdc.parodius.com>
wrote:
> Experts here might be able to help, but you're really going to need to
> provide every little detail, in chronological order. What commands were
> done, what output was seen, what physical actions took place, etc..
>
> 1) Restoring from backups is probably your best bet (IMHO; this is what I
> would do as well).
I didn't provide much detail - as there isn't much detail left to provide
(the pools been destroyed / rebuilt) - how it got messed up is almost
certainly a case of human error / controller 'oddity' with failed devices
[which is now suitably noted for that machine!]...
It was more a 'for future reference' kind of question - does attempting to
import a pool (or even running something as simple as a 'zfs status' when
ZFS has not been 'loaded') actually write to the disks? i.e. could it cause
a pool that is currently 'messed up' to become permanently 'messed up' -
because ZFS will change metadata on the pool, if 'at the time' it deems
devices to be faulted / corrupt etc. - And, if it does - is there any way
of doing a 'test mount/import' (i.e. with the underlying devices only being
opened 'read only' - or does [as I suspect] ZFS *need* r/w access to those
devices as part of the work to actually import/mount.
> There's a lot of other things I could add to the item list here
> (probably reach 9 or 10 if I tried), but in general the above sounds
> like its what happened. raidz2 would have been able to save you in this
> situation, but would require at least 4 disks.
It was RAIDZ2 - it got totally screwed:
"
vol UNAVAIL 0 0 0 insufficient replicas
raidz2 UNAVAIL 0 0 0 insufficient replicas
da3 FAULTED 0 0 0 corrupted data
da4 FAULTED 0 0 0 corrupted data
da5 FAULTED 0 0 0 corrupted data
da6 FAULTED 0 0 0 corrupted data
da7 FAULTED 0 0 0 corrupted data
da8 FAULTED 0 0 0 corrupted data
raidz2 UNAVAIL 0 0 0 insufficient replicas
da1 ONLINE 0 0 0
da2 ONLINE 0 0 0
da9 FAULTED 0 0 0 corrupted data
da10 FAULTED 0 0 0 corrupted data
da11 FAULTED 0 0 0 corrupted data
da11 ONLINE 0 0 0
"
As there is such a large aspect of human error (and controller behaviour),
I don't think it's worth digging into any deeper. It's the first pool we've
ever "lost" under ZFS, and like I said a combination of the controller
collapsing devices, and humans replacing wrong disks, 'twas doomed to fail
from the start.
We've replaced failed drives on this system before - but never rebooted
after a failure, before a replacement - and never replaced the wrong drive
:)
Definitely a good advert for backups though :)
-Karl
More information about the freebsd-fs
mailing list