ZFS pool corrupted on upgrade of -current (probably sata renaming)
Chris Hedley
freebsd-current at chrishedley.com
Tue Jul 14 22:39:56 UTC 2009
[A short summary in advance of my rambling: it seems that my ZFS pool got
upset with the sata drive IDs changing and nearly broke. I /assume/ this
hasn't been discussed, I did look, but please accept my apologies in
advance if it's already a known issue]
I sent a rather panicked message about this yesterday; fortunately I sent
it to the wrong address so I'll send a slightly more sober version of the
same today. :)
I experienced a rather worrying problem when updating from my c. Feb 2009
version of -current to a recent build in that my ZFS pool was quite badly
affected. Fortunately it hasn't /actually/ lost any data (yet) but I
think I've been lucky in that regard and I do feel like the Sword of
Damocles is hanging over me until I've moved it somewhere safe(r).
In more detail, I had a raidz2 pool spread across eight of my 10 sata
discs, using the same "h" partition of the BSD table I'd installed in
"dangerously dedicated" mode. This had been working fine since the
outset, also surviving the ZFS update around the beginning of the year
with no problems.
This time, however, things got extremely hairy: two of the component discs
disappeared altogether, ad12 and ad22 in the new parlance, which would
appear to be ad4 and ad6 in the old. This is perhaps significant as the
two discs using the names ad4 and ad6 in the new nomenclature, formerly
ad1 and ad2 respectively, were also reporting IO errors--I thought I'd had
it as there's no way a raidz2 can survive four disc failures, but perhaps
significantly ad4 and ad6 are the two drive names shared between the old
and the new numbering schemes--as mentioned, the "missing" discs, ad12 and
ad22 being the "old" ad4 and ad6; I'm probably explaining this badly, so
here's a table of the old and new names:
disc old new
---- --- ---
disc 1: ad0 ad4 - IO errors on "new" ad4
disc 2: ad1 ad6 - IO errors on "new" ad6
disc 3: ad2 ad8
disc 4: ad3 ad10
disc 5: ad4 ad12 - "old" ad4 (now ad12) removed from pool
disc 6: ad5 ad20
disc 7: ad6 ad22 - "old" ad6 (now ad22) removed from pool
disc 8: ad7 ad24
In writing this down I think I can see clearly what the problem was,
though I've been unable to find any mention of how to get ZFS to adapt to
the drive names changing (maybe it's more obvious to ZFS veterans, but I'm
not one of them!)
At present I'm moving my data off the ZFS array before it totally confuses
itself and eats my stuff, and enjoying the feeling of being rather cold
and clammy while my data's on non-redundant drives for the first time in
years. I'll probably use a couple of big and simple gmirror arrays in the
short term but I'd like to rebuild my ZFS pool without worrying about the
same thing happening again; could anyone offer suggestions, or perhaps
make ZFS a bit less dependent on FreeBSD's idea of what a disc is called,
or at least point me at something I should've read that might have avoided
all this stuff happening in the first place...?
Thanks,
Chris.
More information about the freebsd-current
mailing list