Can mps drop a failing device from bus?

Dustin Wenz dustinwenz at ebureau.com
Mon Jun 4 22:59:40 UTC 2012


I asked this question back in April on the stable list with no response ( http://lists.freebsd.org/pipermail/freebsd-stable/2012-April/067305.html ). I've now been seeing the same behavior on 9.0-release, and I thought it would be good to ask again here.

There is a failure mode for SATA disks (Seagate Barracuda ST3000DM001 disks, in this case) that the mps driver doesn't handle very well. If a disk is slow to respond, or is unresponsive altogether, I'd like it to be removed from the bus and degrade the zpool that it's a part of.

The way things are now, mps will just report a lot of "SCSI command timeout on device" messages. Any I/O on the affected zpools will hang for an excessive amount of time (sometimes forever). We typically configure our storage volumes as a pool of mirrors, with the expectation that availability will be maintained if any redundant disk(s) should fail. Unfortunately, availability is actually made *worse* on highly-redundant mirrors when mps won't give up on an unresponsive device.

It's possible that I'm overlooking an obvious solution, or some relevant configuration options for the driver. Can anyone offer some insight on this?

Thanks,

	- .Dustin



More information about the freebsd-fs mailing list