Degraded zpool cannot detach old/bad drive

Tue Oct 26 20:04:54 UTC 2010

Hello everyone,

After a few days of struggle with my degraded zpool on a backup server I
decided to ask for
help here or at least get some clues as to what might be wrong with it.
Here's the current state of the zpool:

# zpool status

  pool: tank
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME                          STATE     READ WRITE CKSUM
        tank                          DEGRADED     0     0     0
          raidz1                      DEGRADED     0     0     0
            spare                     DEGRADED     0     0     0
              replacing               DEGRADED     0     0     0
                17307041822177798519  UNAVAIL      0   299     0  was
/dev/gpt/disk-e1:s2
                gpt/newdisk-e1:s2     ONLINE       0     0     0
              gpt/disk-e2:s10         ONLINE       0     0     0
            gpt/disk-e1:s3            ONLINE      30     0     0
            gpt/disk-e1:s4            ONLINE       0     0     0
            gpt/disk-e1:s5            ONLINE       0     0     0
          raidz1                      ONLINE       0     0     0
            gpt/disk-e1:s6            ONLINE       0     0     0
            gpt/disk-e1:s7            ONLINE       0     0     0
            gpt/disk-e1:s8            ONLINE       0     0     0
            gpt/disk-e1:s9            ONLINE       0     0     0
          raidz1                      ONLINE       0     0     0
            gpt/disk-e1:s10           ONLINE       0     0     0
            gpt/disk-e1:s11           ONLINE       0     0     0
            gpt/disk-e1:s12           ONLINE       0     0     0
            gpt/disk-e1:s13           ONLINE       0     0     0
          raidz1                      DEGRADED     0     0     0
            gpt/disk-e1:s14           ONLINE       0     0     0
            gpt/disk-e1:s15           ONLINE       0     0     0
            gpt/disk-e1:s16           ONLINE       0     0     0
            spare                     DEGRADED     0     0     0
              replacing               DEGRADED     0     0     0
                15258738282880603331  UNAVAIL      0    48     0  was
/dev/gpt/disk-e1:s17
                gpt/newdisk-e1:s17    ONLINE       0     0     0
              gpt/disk-e2:s11         ONLINE       0     0     0
          raidz1                      ONLINE       0     0     0
            gpt/disk-e1:s18           ONLINE       0     0     0
            gpt/disk-e1:s19           ONLINE       0     0     0
            gpt/disk-e1:s20           ONLINE       0     0     0
            gpt/disk-e1:s21           ONLINE       0     0     0
          raidz1                      ONLINE       0     0     0
            gpt/disk-e1:s22           ONLINE       0     0     0
            gpt/disk-e1:s23           ONLINE       0     0     0
            gpt/disk-e2:s0            ONLINE       0     0     0
            gpt/disk-e2:s1            ONLINE       0     0     0
          raidz1                      ONLINE       0     0     0
            gpt/disk-e2:s2            ONLINE       0     0     0
            gpt/disk-e2:s3            ONLINE       0     0     0
            gpt/disk-e2:s4            ONLINE       0     0     0
            gpt/disk-e2:s5            ONLINE       0     0     0
          raidz1                      ONLINE       0     0     0
            gpt/disk-e2:s6            ONLINE       0     0     0
            gpt/disk-e2:s7            ONLINE       0     0     0
            gpt/disk-e2:s8            ONLINE       0     0     0
            gpt/disk-e2:s9            ONLINE       0     0     0
        spares
          gpt/disk-e2:s10             INUSE     currently in use
          gpt/disk-e2:s11             INUSE     currently in use
          gpt/disk-e1:s2              UNAVAIL   cannot open
          gpt/newdisk-e1:s17          INUSE     currently in use

errors: 4 data errors, use '-v' for a list

The problem is: after replacing the bad drives and resilvering the old/bad
drives cannot be detached.
The replace command didn't remove it automatically and manual detach fails.
Here are some examples:

# zpool detach tank 15258738282880603331
cannot detach 15258738282880603331: no valid replicas
# zpool detach tank gpt/disk-e2:s11
cannot detach gpt/disk-e2:s11: no valid replicas
# zpool detach tank gpt/newdisk-e1:s17
cannot detach gpt/newdisk-e1:s17: no valid replicas
# zpool detach tank gpt/disk-e1:s17
cannot detach gpt/disk-e1:s17: no valid replicas

Here's more information and history of events.
This is a 36 disk SuperMicro 847 machine with 2T WD RE4 disks organized in
raidz1 groups as
depicted above. zpool deals only with partitions like those:

=>        34  3904294845  mfid30  GPT  (1.8T)
          34  3903897600       1  disk-e2:s9  (1.8T)
  3903897634      397245          - free -  (194M)

mfidXX devices are disks connected to a SuperMicro/LSI controller and
presented as jbods. JBODs in this adapter
are actually constructed as raid0 array of 1 disk but this should be
irrelevant in this case.

This machine was working fine since September 6th but two of the disks (in
different raidz1 vdevs) were going
pretty bad and accumulated quite a bit of errors until eventually they died.
This is how they looked like:

          raidz1             DEGRADED     0     0     0
            gpt/disk-e1:s2   UNAVAIL     44 59.5K     0  experienced I/O
failures
            gpt/disk-e1:s3   ONLINE       0     0     0
            gpt/disk-e1:s4   ONLINE       0     0     0
            gpt/disk-e1:s5   ONLINE       0     0     0

          raidz1             DEGRADED     0     0     0
            gpt/disk-e1:s14  ONLINE       0     0     0
            gpt/disk-e1:s15  ONLINE       0     0     0
            gpt/disk-e1:s16  ONLINE       0     0     0
            gpt/disk-e1:s17  UNAVAIL  1.56K 49.0K     0  experienced I/O
failures

I did have two spare disks ready to replace them. So after they died here's
what I executed:

# zpool replace tank gpt/disk-e1:s2 gpt/disk-e2:s10
# zpool replace tank gpt/disk-e1:s17 gpt/disk-e2:s11

Resilvering started. While in the middle of it though the kernel paniced and
I had to reboot the machine.
After reboot I waited until the resilvering is complete. Now that it was
complete I expected to see the old/bad
device removed from the vdev but it was still there. Trying detach was
complaining with no valid replicas.
I sent colo technician to replace both those defective drives with brand new
ones. Once I had them inserted
I recreated them exactly the same way as the ones that I had before - jbod
and gpart labeled partition with the
same name! Then I added them as spares:

# zpool add tank spare gpt/disk-e1:s2
# zpool add tank spare gpt/disk-e1:s17

That actually made it worse I think since now I had the same device name
both as a 'previous' failed device
inside the raidz1 group and as a hot spare spare device. I couldn't do
anything with it.
What I did was to export the pool fail the disk on the controller, import
the pool and check that zfs could open
it anymore (as a part of the hot spares). Then I recreated that
disk/partition with a new label 'newdisk-XXX'
and tried to replace the device that originally failed (and was only
presented with a number). So I did this:

# zpool replace tank gpt/disk-e1:s17 gpt/newdisk-e1:s17
# zpool replace tank gpt/disk-e1:s2 gpt/newdisk-e1:s2

Resilvering completed after 17 hours or so and I expected for the
'replacing' operation to disappear and the
replaced device to go away. But it didn't! Instead I have the state of the
pool as shown in the beginning of
the email.
As for the 'errors: 4 data errors, use '-v' for a list' I suspect that it's
due another failing
device (gpt/disk-e1:s3) inside the first (currently degraded) raidz1 vdev.
Those 4 corrupted files actually
could be read sometimes so that tells me that the disk has trouble reading
*sometimes* those bad blocks.

Here's the output of zdb -l tank

    version=14
    name='tank'
    state=0
    txg=200225
    pool_guid=13504509992978610301
    hostid=409325918
    hostname='XXXX'
    vdev_tree
        type='root'
        id=0
        guid=13504509992978610301
        children[0]
                type='raidz'
                id=0
                guid=3740854890192825394
                nparity=1
                metaslab_array=33
                metaslab_shift=36
                ashift=9
                asize=7995163410432
                is_log=0
                children[0]
                        type='spare'
                        id=0
                        guid=16171901098004278313
                        whole_disk=0
                        children[0]
                                type='replacing'
                                id=0
                                guid=2754550310390861576
                                whole_disk=0
                                children[0]
                                        type='disk'
                                        id=0
                                        guid=17307041822177798519
                                        path='/dev/gpt/disk-e1:s2'
                                        whole_disk=0
                                        not_present=1
                                        DTL=246
                                children[1]
                                        type='disk'
                                        id=1
                                        guid=1641394056824955485
                                        path='/dev/gpt/newdisk-e1:s2'
                                        whole_disk=0
                                        DTL=55
                        children[1]
                                type='disk'
                                id=1
                                guid=13150356781300468512
                                path='/dev/gpt/disk-e2:s10'
                                whole_disk=0
                                is_spare=1
                                DTL=1289
                children[1]
                        type='disk'
                        id=1
                        guid=6047192237176807561
                        path='/dev/gpt/disk-e1:s3'
                        whole_disk=0
                        DTL=250
                children[2]
                        type='disk'
                        id=2
                        guid=9178318500891071208
                        path='/dev/gpt/disk-e1:s4'
                        whole_disk=0
                        DTL=249
                children[3]
                        type='disk'
                        id=3
                        guid=2567999855746767831
                        path='/dev/gpt/disk-e1:s5'
                        whole_disk=0
                        DTL=248
        children[1]
                type='raidz'
                id=1
                guid=17097047310177793733
                nparity=1
                metaslab_array=31
                metaslab_shift=36
                ashift=9
                asize=7995163410432
                is_log=0
                children[0]
                        type='disk'
                        id=0
                        guid=14513380297393196654
                        path='/dev/gpt/disk-e1:s6'
                        whole_disk=0
                        DTL=266
                children[1]
                        type='disk'
                        id=1
                        guid=7673391645329839273
                        path='/dev/gpt/disk-e1:s7'
                        whole_disk=0
                        DTL=265
                children[2]
                        type='disk'
                        id=2
                        guid=15189132305590412134
                        path='/dev/gpt/disk-e1:s8'
                        whole_disk=0
                        DTL=264
                children[3]
                        type='disk'
                        id=3
                        guid=17171875527714022076
                        path='/dev/gpt/disk-e1:s9'
                        whole_disk=0
                        DTL=263
        children[2]
                type='raidz'
                id=2
                guid=4551002265962803186
                nparity=1
                metaslab_array=30
                metaslab_shift=36
                ashift=9
                asize=7995163410432
                is_log=0
                children[0]
                        type='disk'
                        id=0
                        guid=12104241519484712161
                        path='/dev/gpt/disk-e1:s10'
                        whole_disk=0
                        DTL=262
                children[1]
                        type='disk'
                        id=1
                        guid=3950210349623142325
                        path='/dev/gpt/disk-e1:s11'
                        whole_disk=0
                        DTL=261
                children[2]
                        type='disk'
                        id=2
                        guid=14559903955698640085
                        path='/dev/gpt/disk-e1:s12'
                        whole_disk=0
                        DTL=260
                children[3]
                        type='disk'
                        id=3
                        guid=12364155114844220066
                        path='/dev/gpt/disk-e1:s13'
                        whole_disk=0
                        DTL=259
        children[3]
                type='raidz'
                id=3
                guid=12517231224568010294
                nparity=1
                metaslab_array=29
                metaslab_shift=36
                ashift=9
                asize=7995163410432
                is_log=0
                children[0]
                        type='disk'
                        id=0
                        guid=7655789038925330983
                        path='/dev/gpt/disk-e1:s14'
                        whole_disk=0
                        DTL=258
                children[1]
                        type='disk'
                        id=1
                        guid=17815755378968233141
                        path='/dev/gpt/disk-e1:s15'
                        whole_disk=0
                        DTL=257
                children[2]
                        type='disk'
                        id=2
                        guid=9590421681925673767
                        path='/dev/gpt/disk-e1:s16'
                        whole_disk=0
                        DTL=256
                children[3]
                        type='spare'
                        id=3
                        guid=4015417100051235398
                        whole_disk=0
                        children[0]
                                type='replacing'
                                id=0
                                guid=11653429697330193176
                                whole_disk=0
                                children[0]
                                        type='disk'
                                        id=0
                                        guid=15258738282880603331
                                        path='/dev/gpt/disk-e1:s17'
                                        whole_disk=0
                                        not_present=1
                                        DTL=255
                                children[1]
                                        type='disk'
                                        id=1
                                        guid=908651380690954833
                                        path='/dev/gpt/newdisk-e1:s17'
                                        whole_disk=0
                                        is_spare=1
                                        DTL=52
                        children[1]
                                type='disk'
                                id=1
                                guid=7250934196571906160
                                path='/dev/gpt/disk-e2:s11'
                                whole_disk=0
                                is_spare=1
                                DTL=1292
        children[4]
                type='raidz'
                id=4
                guid=7622366288306613136
                nparity=1
                metaslab_array=28
                metaslab_shift=36
                ashift=9
                asize=7995163410432
                is_log=0
                children[0]
                        type='disk'
                        id=0
                        guid=11283483106921343963
                        path='/dev/gpt/disk-e1:s18'
                        whole_disk=0
                        DTL=254
                children[1]
                        type='disk'
                        id=1
                        guid=14900597968455968576
                        path='/dev/gpt/disk-e1:s19'
                        whole_disk=0
                        DTL=253
                children[2]
                        type='disk'
                        id=2
                        guid=4140592611852504513
                        path='/dev/gpt/disk-e1:s20'
                        whole_disk=0
                        DTL=252
                children[3]
                        type='disk'
                        id=3
                        guid=2794215380207576975
                        path='/dev/gpt/disk-e1:s21'
                        whole_disk=0
                        DTL=251
        children[5]
                type='raidz'
                id=5
                guid=17655293908271300889
                nparity=1
                metaslab_array=27
                metaslab_shift=36
                ashift=9
                asize=7995163410432
                is_log=0
                children[0]
                        type='disk'
                        id=0
                        guid=5274146379037055039
                        path='/dev/gpt/disk-e1:s22'
                        whole_disk=0
                        DTL=278
                children[1]
                        type='disk'
                        id=1
                        guid=8651755019404873686
                        path='/dev/gpt/disk-e1:s23'
                        whole_disk=0
                        DTL=277
                children[2]
                        type='disk'
                        id=2
                        guid=16827379661759988976
                        path='/dev/gpt/disk-e2:s0'
                        whole_disk=0
                        DTL=276
                children[3]
                        type='disk'
                        id=3
                        guid=2524967151333933972
                        path='/dev/gpt/disk-e2:s1'
                        whole_disk=0
                        DTL=275
        children[6]
                type='raidz'
                id=6
                guid=2413519694016115220
                nparity=1
                metaslab_array=26
                metaslab_shift=36
                ashift=9
                asize=7995163410432
                is_log=0
                children[0]
                        type='disk'
                        id=0
                        guid=16361968944335143412
                        path='/dev/gpt/disk-e2:s2'
                        whole_disk=0
                        DTL=274
                children[1]
                        type='disk'
                        id=1
                        guid=10054650477559530937
                        path='/dev/gpt/disk-e2:s3'
                        whole_disk=0
                        DTL=273
                children[2]
                        type='disk'
                        id=2
                        guid=17105959045159531558
                        path='/dev/gpt/disk-e2:s4'
                        whole_disk=0
                        DTL=272
                children[3]
                        type='disk'
                        id=3
                        guid=17370453969371497663
                        path='/dev/gpt/disk-e2:s5'
                        whole_disk=0
                        DTL=271
        children[7]
                type='raidz'
                id=7
                guid=4614010953103453823
                nparity=1
                metaslab_array=24
                metaslab_shift=36
                ashift=9
                asize=7995163410432
                is_log=0
                children[0]
                        type='disk'
                        id=0
                        guid=10090128057592036175
                        path='/dev/gpt/disk-e2:s6'
                        whole_disk=0
                        DTL=270
                children[1]
                        type='disk'
                        id=1
                        guid=16676544025008223925
                        path='/dev/gpt/disk-e2:s7'
                        whole_disk=0
                        DTL=269
                children[2]
                        type='disk'
                        id=2
                        guid=11777789246954957292
                        path='/dev/gpt/disk-e2:s8'
                        whole_disk=0
                        DTL=268
                children[3]
                        type='disk'
                        id=3
                        guid=3406600121427522915
                        path='/dev/gpt/disk-e2:s9'
                        whole_disk=0
                        DTL=267

OS:
8.1-STABLE FreeBSD 8.1-STABLE #0: Sun Sep  5 00:22:45 PDT 2010 amd64

Hardware:
Chassis:        SuperMicro 847E1 (two backplanes 24 disks front and 12 disks
in the back)
Motherboard:    X8SIL
CPU:            1 x X3430  @ 2.40GHz
RAM:            16G
HDD Controller: SuperMicro / LSI 9260 (pciconf -lv  SAS1078 PCI-X Fusion-MPT
SAS) : 2 ports
Disks:          36 x 2T Western Digital RE4

Any help would be appreciated. Let me know what additional information I
should provide.
Thank you in advance,
-- 
Rumen Telbizov