zfs tasting dropped a stripe out of my pool. help getting it back?
GM
gildenman at gmail.com
Thu Jul 14 17:37:52 UTC 2011
Hi,
Whilst the way zfs looks for it's data everywhere can be useful when devices change,
I've been rather stung by it.
I have a raidz2 with 4x2TB and 2x 2x1TB stripes to make 6x2TB in total.
I currently have this:
pool: pool2
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
scan: resilvered 1.83M in 0h0m with 0 errors on Thu Jul 14 14:59:22 2011
config:
NAME STATE READ WRITE CKSUM
pool2 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gpt/2TB_drive0 ONLINE 0 0 0
gpt/2TB_drive1 ONLINE 0 0 0
gpt/2TB_drive2 ONLINE 0 0 0
13298804679359865221 UNAVAIL 0 0 0 was /dev/gpt/1TB_drive0
12966661380732156057 UNAVAIL 0 0 0 was /dev/gpt/1TB_drive2
gpt/2TB_drive3 ONLINE 0 0 0
cache
gpt/cache0 ONLINE 0 0 0
The two UNAVAIL entries used to be stripes. The system helpfully removed them for me.
These are the stripes that used to be in the pool:
# gstripe status
Name Status Components
stripe/1TB_drive0+1 UP gpt/1TB_drive1
gpt/1TB_drive0
stripe/1TB_drive2+3 UP gpt/1TB_drive3
gpt/1TB_drive2
They still exist and have all the data in them.
It started, when I booted up with the drive that has gpt/1TB_drive1 missing and zfs helpfully replaced the
stripe/1TB_drive0+1 device with gpt/1TB_drive0 and told me it had corrupt data on it.
Am I right in thinking, that cos one drive was missing which meant that stripe/1TB_drive0+1
was then also missing, that zfs tasted around and found gpt/1TB_drive0 had what look like
the right header on it. However, 64k in, it would find incorrect data, as the next 64k was
on the missing part of the stripe on gpt/1TB_drive1?
I was contemplating how to get the stripe back into the pool without having to do a complete
resilver on it. Seemed unnecessary to have to do that when the data was all there.
I thought an export and import might help it find it. However, that for some reason did the same
to the other stripe stripe/1TB_drive2+3 and it got replaced with gpt/1TB_drive2.
Now I am left without parity.
Any ideas on what commands will bring this back?
I know I can do a replace on both, but if there is some undetected corruption on the other devices then I will
lose some data, as any parity that could fix it is currently missing. I do scrub regularly, but I'd prefer not
to take that chance. Especially as I have all the data sitting there!
I hoping someone has some magic zfs commands to make all this go away :)
What can I do to prevent this in future? I've run pools with stripes for years without this happening.
It seems zfs has started to look far and wide for it's devices? In the past if the stripe was broken,
it would just tell me the device was missing. When the stripe was back, then all was fine. However,
this tasting everywhere seems like stripes are now a no-no for zpools?
Thanks.
More information about the freebsd-stable
mailing list