Degraded zpool cannot detach old/bad drive
Rumen Telbizov
telbizov at gmail.com
Tue Oct 26 20:04:54 UTC 2010
Hello everyone,
After a few days of struggle with my degraded zpool on a backup server I
decided to ask for
help here or at least get some clues as to what might be wrong with it.
Here's the current state of the zpool:
# zpool status
pool: tank
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
spare DEGRADED 0 0 0
replacing DEGRADED 0 0 0
17307041822177798519 UNAVAIL 0 299 0 was
/dev/gpt/disk-e1:s2
gpt/newdisk-e1:s2 ONLINE 0 0 0
gpt/disk-e2:s10 ONLINE 0 0 0
gpt/disk-e1:s3 ONLINE 30 0 0
gpt/disk-e1:s4 ONLINE 0 0 0
gpt/disk-e1:s5 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
gpt/disk-e1:s6 ONLINE 0 0 0
gpt/disk-e1:s7 ONLINE 0 0 0
gpt/disk-e1:s8 ONLINE 0 0 0
gpt/disk-e1:s9 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
gpt/disk-e1:s10 ONLINE 0 0 0
gpt/disk-e1:s11 ONLINE 0 0 0
gpt/disk-e1:s12 ONLINE 0 0 0
gpt/disk-e1:s13 ONLINE 0 0 0
raidz1 DEGRADED 0 0 0
gpt/disk-e1:s14 ONLINE 0 0 0
gpt/disk-e1:s15 ONLINE 0 0 0
gpt/disk-e1:s16 ONLINE 0 0 0
spare DEGRADED 0 0 0
replacing DEGRADED 0 0 0
15258738282880603331 UNAVAIL 0 48 0 was
/dev/gpt/disk-e1:s17
gpt/newdisk-e1:s17 ONLINE 0 0 0
gpt/disk-e2:s11 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
gpt/disk-e1:s18 ONLINE 0 0 0
gpt/disk-e1:s19 ONLINE 0 0 0
gpt/disk-e1:s20 ONLINE 0 0 0
gpt/disk-e1:s21 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
gpt/disk-e1:s22 ONLINE 0 0 0
gpt/disk-e1:s23 ONLINE 0 0 0
gpt/disk-e2:s0 ONLINE 0 0 0
gpt/disk-e2:s1 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
gpt/disk-e2:s2 ONLINE 0 0 0
gpt/disk-e2:s3 ONLINE 0 0 0
gpt/disk-e2:s4 ONLINE 0 0 0
gpt/disk-e2:s5 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
gpt/disk-e2:s6 ONLINE 0 0 0
gpt/disk-e2:s7 ONLINE 0 0 0
gpt/disk-e2:s8 ONLINE 0 0 0
gpt/disk-e2:s9 ONLINE 0 0 0
spares
gpt/disk-e2:s10 INUSE currently in use
gpt/disk-e2:s11 INUSE currently in use
gpt/disk-e1:s2 UNAVAIL cannot open
gpt/newdisk-e1:s17 INUSE currently in use
errors: 4 data errors, use '-v' for a list
The problem is: after replacing the bad drives and resilvering the old/bad
drives cannot be detached.
The replace command didn't remove it automatically and manual detach fails.
Here are some examples:
# zpool detach tank 15258738282880603331
cannot detach 15258738282880603331: no valid replicas
# zpool detach tank gpt/disk-e2:s11
cannot detach gpt/disk-e2:s11: no valid replicas
# zpool detach tank gpt/newdisk-e1:s17
cannot detach gpt/newdisk-e1:s17: no valid replicas
# zpool detach tank gpt/disk-e1:s17
cannot detach gpt/disk-e1:s17: no valid replicas
Here's more information and history of events.
This is a 36 disk SuperMicro 847 machine with 2T WD RE4 disks organized in
raidz1 groups as
depicted above. zpool deals only with partitions like those:
=> 34 3904294845 mfid30 GPT (1.8T)
34 3903897600 1 disk-e2:s9 (1.8T)
3903897634 397245 - free - (194M)
mfidXX devices are disks connected to a SuperMicro/LSI controller and
presented as jbods. JBODs in this adapter
are actually constructed as raid0 array of 1 disk but this should be
irrelevant in this case.
This machine was working fine since September 6th but two of the disks (in
different raidz1 vdevs) were going
pretty bad and accumulated quite a bit of errors until eventually they died.
This is how they looked like:
raidz1 DEGRADED 0 0 0
gpt/disk-e1:s2 UNAVAIL 44 59.5K 0 experienced I/O
failures
gpt/disk-e1:s3 ONLINE 0 0 0
gpt/disk-e1:s4 ONLINE 0 0 0
gpt/disk-e1:s5 ONLINE 0 0 0
raidz1 DEGRADED 0 0 0
gpt/disk-e1:s14 ONLINE 0 0 0
gpt/disk-e1:s15 ONLINE 0 0 0
gpt/disk-e1:s16 ONLINE 0 0 0
gpt/disk-e1:s17 UNAVAIL 1.56K 49.0K 0 experienced I/O
failures
I did have two spare disks ready to replace them. So after they died here's
what I executed:
# zpool replace tank gpt/disk-e1:s2 gpt/disk-e2:s10
# zpool replace tank gpt/disk-e1:s17 gpt/disk-e2:s11
Resilvering started. While in the middle of it though the kernel paniced and
I had to reboot the machine.
After reboot I waited until the resilvering is complete. Now that it was
complete I expected to see the old/bad
device removed from the vdev but it was still there. Trying detach was
complaining with no valid replicas.
I sent colo technician to replace both those defective drives with brand new
ones. Once I had them inserted
I recreated them exactly the same way as the ones that I had before - jbod
and gpart labeled partition with the
same name! Then I added them as spares:
# zpool add tank spare gpt/disk-e1:s2
# zpool add tank spare gpt/disk-e1:s17
That actually made it worse I think since now I had the same device name
both as a 'previous' failed device
inside the raidz1 group and as a hot spare spare device. I couldn't do
anything with it.
What I did was to export the pool fail the disk on the controller, import
the pool and check that zfs could open
it anymore (as a part of the hot spares). Then I recreated that
disk/partition with a new label 'newdisk-XXX'
and tried to replace the device that originally failed (and was only
presented with a number). So I did this:
# zpool replace tank gpt/disk-e1:s17 gpt/newdisk-e1:s17
# zpool replace tank gpt/disk-e1:s2 gpt/newdisk-e1:s2
Resilvering completed after 17 hours or so and I expected for the
'replacing' operation to disappear and the
replaced device to go away. But it didn't! Instead I have the state of the
pool as shown in the beginning of
the email.
As for the 'errors: 4 data errors, use '-v' for a list' I suspect that it's
due another failing
device (gpt/disk-e1:s3) inside the first (currently degraded) raidz1 vdev.
Those 4 corrupted files actually
could be read sometimes so that tells me that the disk has trouble reading
*sometimes* those bad blocks.
Here's the output of zdb -l tank
version=14
name='tank'
state=0
txg=200225
pool_guid=13504509992978610301
hostid=409325918
hostname='XXXX'
vdev_tree
type='root'
id=0
guid=13504509992978610301
children[0]
type='raidz'
id=0
guid=3740854890192825394
nparity=1
metaslab_array=33
metaslab_shift=36
ashift=9
asize=7995163410432
is_log=0
children[0]
type='spare'
id=0
guid=16171901098004278313
whole_disk=0
children[0]
type='replacing'
id=0
guid=2754550310390861576
whole_disk=0
children[0]
type='disk'
id=0
guid=17307041822177798519
path='/dev/gpt/disk-e1:s2'
whole_disk=0
not_present=1
DTL=246
children[1]
type='disk'
id=1
guid=1641394056824955485
path='/dev/gpt/newdisk-e1:s2'
whole_disk=0
DTL=55
children[1]
type='disk'
id=1
guid=13150356781300468512
path='/dev/gpt/disk-e2:s10'
whole_disk=0
is_spare=1
DTL=1289
children[1]
type='disk'
id=1
guid=6047192237176807561
path='/dev/gpt/disk-e1:s3'
whole_disk=0
DTL=250
children[2]
type='disk'
id=2
guid=9178318500891071208
path='/dev/gpt/disk-e1:s4'
whole_disk=0
DTL=249
children[3]
type='disk'
id=3
guid=2567999855746767831
path='/dev/gpt/disk-e1:s5'
whole_disk=0
DTL=248
children[1]
type='raidz'
id=1
guid=17097047310177793733
nparity=1
metaslab_array=31
metaslab_shift=36
ashift=9
asize=7995163410432
is_log=0
children[0]
type='disk'
id=0
guid=14513380297393196654
path='/dev/gpt/disk-e1:s6'
whole_disk=0
DTL=266
children[1]
type='disk'
id=1
guid=7673391645329839273
path='/dev/gpt/disk-e1:s7'
whole_disk=0
DTL=265
children[2]
type='disk'
id=2
guid=15189132305590412134
path='/dev/gpt/disk-e1:s8'
whole_disk=0
DTL=264
children[3]
type='disk'
id=3
guid=17171875527714022076
path='/dev/gpt/disk-e1:s9'
whole_disk=0
DTL=263
children[2]
type='raidz'
id=2
guid=4551002265962803186
nparity=1
metaslab_array=30
metaslab_shift=36
ashift=9
asize=7995163410432
is_log=0
children[0]
type='disk'
id=0
guid=12104241519484712161
path='/dev/gpt/disk-e1:s10'
whole_disk=0
DTL=262
children[1]
type='disk'
id=1
guid=3950210349623142325
path='/dev/gpt/disk-e1:s11'
whole_disk=0
DTL=261
children[2]
type='disk'
id=2
guid=14559903955698640085
path='/dev/gpt/disk-e1:s12'
whole_disk=0
DTL=260
children[3]
type='disk'
id=3
guid=12364155114844220066
path='/dev/gpt/disk-e1:s13'
whole_disk=0
DTL=259
children[3]
type='raidz'
id=3
guid=12517231224568010294
nparity=1
metaslab_array=29
metaslab_shift=36
ashift=9
asize=7995163410432
is_log=0
children[0]
type='disk'
id=0
guid=7655789038925330983
path='/dev/gpt/disk-e1:s14'
whole_disk=0
DTL=258
children[1]
type='disk'
id=1
guid=17815755378968233141
path='/dev/gpt/disk-e1:s15'
whole_disk=0
DTL=257
children[2]
type='disk'
id=2
guid=9590421681925673767
path='/dev/gpt/disk-e1:s16'
whole_disk=0
DTL=256
children[3]
type='spare'
id=3
guid=4015417100051235398
whole_disk=0
children[0]
type='replacing'
id=0
guid=11653429697330193176
whole_disk=0
children[0]
type='disk'
id=0
guid=15258738282880603331
path='/dev/gpt/disk-e1:s17'
whole_disk=0
not_present=1
DTL=255
children[1]
type='disk'
id=1
guid=908651380690954833
path='/dev/gpt/newdisk-e1:s17'
whole_disk=0
is_spare=1
DTL=52
children[1]
type='disk'
id=1
guid=7250934196571906160
path='/dev/gpt/disk-e2:s11'
whole_disk=0
is_spare=1
DTL=1292
children[4]
type='raidz'
id=4
guid=7622366288306613136
nparity=1
metaslab_array=28
metaslab_shift=36
ashift=9
asize=7995163410432
is_log=0
children[0]
type='disk'
id=0
guid=11283483106921343963
path='/dev/gpt/disk-e1:s18'
whole_disk=0
DTL=254
children[1]
type='disk'
id=1
guid=14900597968455968576
path='/dev/gpt/disk-e1:s19'
whole_disk=0
DTL=253
children[2]
type='disk'
id=2
guid=4140592611852504513
path='/dev/gpt/disk-e1:s20'
whole_disk=0
DTL=252
children[3]
type='disk'
id=3
guid=2794215380207576975
path='/dev/gpt/disk-e1:s21'
whole_disk=0
DTL=251
children[5]
type='raidz'
id=5
guid=17655293908271300889
nparity=1
metaslab_array=27
metaslab_shift=36
ashift=9
asize=7995163410432
is_log=0
children[0]
type='disk'
id=0
guid=5274146379037055039
path='/dev/gpt/disk-e1:s22'
whole_disk=0
DTL=278
children[1]
type='disk'
id=1
guid=8651755019404873686
path='/dev/gpt/disk-e1:s23'
whole_disk=0
DTL=277
children[2]
type='disk'
id=2
guid=16827379661759988976
path='/dev/gpt/disk-e2:s0'
whole_disk=0
DTL=276
children[3]
type='disk'
id=3
guid=2524967151333933972
path='/dev/gpt/disk-e2:s1'
whole_disk=0
DTL=275
children[6]
type='raidz'
id=6
guid=2413519694016115220
nparity=1
metaslab_array=26
metaslab_shift=36
ashift=9
asize=7995163410432
is_log=0
children[0]
type='disk'
id=0
guid=16361968944335143412
path='/dev/gpt/disk-e2:s2'
whole_disk=0
DTL=274
children[1]
type='disk'
id=1
guid=10054650477559530937
path='/dev/gpt/disk-e2:s3'
whole_disk=0
DTL=273
children[2]
type='disk'
id=2
guid=17105959045159531558
path='/dev/gpt/disk-e2:s4'
whole_disk=0
DTL=272
children[3]
type='disk'
id=3
guid=17370453969371497663
path='/dev/gpt/disk-e2:s5'
whole_disk=0
DTL=271
children[7]
type='raidz'
id=7
guid=4614010953103453823
nparity=1
metaslab_array=24
metaslab_shift=36
ashift=9
asize=7995163410432
is_log=0
children[0]
type='disk'
id=0
guid=10090128057592036175
path='/dev/gpt/disk-e2:s6'
whole_disk=0
DTL=270
children[1]
type='disk'
id=1
guid=16676544025008223925
path='/dev/gpt/disk-e2:s7'
whole_disk=0
DTL=269
children[2]
type='disk'
id=2
guid=11777789246954957292
path='/dev/gpt/disk-e2:s8'
whole_disk=0
DTL=268
children[3]
type='disk'
id=3
guid=3406600121427522915
path='/dev/gpt/disk-e2:s9'
whole_disk=0
DTL=267
OS:
8.1-STABLE FreeBSD 8.1-STABLE #0: Sun Sep 5 00:22:45 PDT 2010 amd64
Hardware:
Chassis: SuperMicro 847E1 (two backplanes 24 disks front and 12 disks
in the back)
Motherboard: X8SIL
CPU: 1 x X3430 @ 2.40GHz
RAM: 16G
HDD Controller: SuperMicro / LSI 9260 (pciconf -lv SAS1078 PCI-X Fusion-MPT
SAS) : 2 ports
Disks: 36 x 2T Western Digital RE4
Any help would be appreciated. Let me know what additional information I
should provide.
Thank you in advance,
--
Rumen Telbizov
More information about the freebsd-stable
mailing list