raidz2 recovery problem on 8.0p2
Kurt Lidl
kurt.lidl at cello.com
Mon May 3 20:34:23 UTC 2010
I have a 12GB memory machine, with a mpt controller in it, running a ZFS
raidz2
for (test) data storage. The system also has a ZFS mirror in place for
the OS,
home directories, etc.
I manually failed one of the disks in the JBOD shelf and watched as the mpt
controller started logging errors. Ultimately, I tried to reboot the
machine,
but it panic'd instead of rebooting cleanly. It failed to crashdump too
(Got about 200MB into
the dump and stopped.)
Upon reboot, I saw that zfs thought there were two da6 disk devices.
Which was strange, since at this point, the machine should have had
da0 through da6. I issued a 'zpool clear media da6' command, but
that didn't resolve anything.
Then I plugged the drive back into the JBOD and rebooted.
Now I see the following:
user at host: zpool status media
pool: media
state: DEGRADED
status: One or more devices could not be used because the label is
missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
media DEGRADED 0 0 0
raidz2 DEGRADED 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
da4 ONLINE 0 0 0
da5 ONLINE 0 0 0
da6 ONLINE 0 0 0
da6 FAULTED 0 98 0 corrupted data
errors: No known data errors
Note that there are *two* da6 devices listed, at least from zpool's
point of view.
A dmesg reports this:
da0 at mpt0 bus 0 target 8 lun 0
da0: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da0: 300.000MB/s transfers
da0: Command Queueing enabled
da0: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da1 at mpt0 bus 0 target 9 lun 0
da1: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da1: 300.000MB/s transfers
da1: Command Queueing enabled
da1: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da2 at mpt0 bus 0 target 10 lun 0
da2: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da2: 300.000MB/s transfers
da2: Command Queueing enabled
da2: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da3 at mpt0 bus 0 target 11 lun 0
da3: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da3: 300.000MB/s transfers
da3: Command Queueing enabled
da3: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da4 at mpt0 bus 0 target 12 lun 0
da4: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da4: 300.000MB/s transfers
da4: Command Queueing enabled
da4: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da5 at mpt0 bus 0 target 13 lun 0
da5: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da5: 300.000MB/s transfers
da5: Command Queueing enabled
da5: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da6 at mpt0 bus 0 target 14 lun 0
da6: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da6: 300.000MB/s transfers
da6: Command Queueing enabled
da6: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da7 at mpt0 bus 0 target 15 lun 0
da7: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da7: 300.000MB/s transfers
da7: Command Queueing enabled
da7: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
Any suggestions about how to get this raid back into a non-degraded state?
For whatever it's worth, 'uname -a' reports:
FreeBSD host.fairview-park.com 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0:
Tue Jan 5 21:11:58 UTC 2010
root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
Thanks for any help.
-Kurt
More information about the freebsd-stable
mailing list