A failed drive causes system to hang

Zaphod Beeblebrox zbeeble at gmail.com
Sun Apr 14 18:58:16 UTC 2013


I'd like to throw in my two cents here.  I've seen this (drives in RAID-1
configuration) hanging whole systems.  Back in the IDE days, two drives
were connected with one cable --- I largely wrote it off as a deficiency of
IDE hardware and resolved to by SCSI hardware for more important systems.
Of late, the physical hardware for SCSI (SAS) and SATA drives have
converged.  I'm willing to accept that SAS hardware may be built to a
different standard, but I'm suspicious of the fact that a bad SATA drive on
an ACH* controller can hang the whole system.

... it's not complete, however.  Often pulling the drive's cable will
unfreeze things.  It's also not entirely consistent.  Drives I have behind
4:1 port multipliers haven't (so far) hung the system that they're on
(which uses ACH10).  Right now, I have a remote ACH10 system that's hung
hard a couple of times --- and it passes both it's short and long SMART
tests on both drives.

Is there no global timeout we can depend on here?


More information about the freebsd-fs mailing list