Western Digital hard disks and ATA timeouts

Jonas Lund whizzter at gmail.com
Fri Nov 7 13:10:16 PST 2008


As i'm writing this i'm trying to rescue the contents of another computers disk.

Something about the seek heads or something related to that is
physically half-broken so the disk might need up to 10 retries just to
read a sector, once read however it's usually no problem. I'm using
myrescue (running on 6.2 so i don't know if it's included in the
current ports but if anyone wants to run it on freebsd i've done the
"gruntwork" for porting) so it's not a really big issue with all the
timeouts as it'll try to read that sector again later, but had i had
the sysctl i would've been a tad happier right now.

As for the defaults being a small value i personally think it's better
to throw out some messages/errors early on before the disk reaches a
catastrophic state (Atleast on 6.2 the kernel will put out a message
for each retry without giving faults, maybe more retries before
throwing an error maybe?).

By catastrpohic state i'm refering to that oh-so-famous google paper
that did say that once a disk has started showing errors it doesn't
have long to live, but i do trust that conclusion as i've been
"warned" by these messages 2 times but ignored them until the disk
went really bad.

The main thing i'm trying to get through is that early warning and
small problems are helluva lot better than big disasters. Thing of it
like the oil meter on your car, it's not like you're gonna go out and
drive 100s of km's in the wilderness if you know that the car is in a
bad state. (Now if only smart info was reliable!)

/ Jonas

2008/11/7 Peter Wemm <peter at wemm.org>:
> On Thu, Nov 6, 2008 at 11:17 PM, Jeremy Chadwick <koitsu at freebsd.org> wrote:
> [..]
>> As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and
>> is not adjustable without editing the ATA code yourself and increasing
>> the value.  The FreeNAS folks have made patches available to turn the
>> timeout value into a sysctl.
>>
>> Soren and/or others, please increase this timeout value.  Five seconds
>> has now been deemed too aggressive a default.  And please consider
>> migrating the timeout value into a sysctl.
>
> The 5 second timeout has been a problem for quite a while actually.
> I've had a number of instances where I've had to increase it to 20 or
> 30 seconds when recovering from marginal drives.  The longest
> "successful" recovery attempt I've seen was 26 seconds, I believe on a
> Maxtor drive a few years ago.   ("successful" == the drive spent 26
> seconds but eventually successfully read the sector).  Even the IBM
> death star drives could take much longer than 5 seconds to do a
> recovery 5 years ago.  5 seconds has never been a good default.
>
> I think the timeout should be increased to at least 30 seconds.  My
> windows box has a timeout that goes for several minutes.
>
> If there is concern about FreeBSD appearing to hang, I could imagine
> that a console warning message could be printed after 5 seconds.  But
> just say "drive has not yet responded".  But give it more time.
>
> In this day and age we're generally not playing games with udma33 vs
> 66, notched cables, poor CRC support etc.  SATA seems to have
> eliminated all that.  Hmm, it might make sense to increase the timeout
> on SATA connections to 2 or 3 minutes by default.
> --
> Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com; KI6FJV
> "All of this is for nothing if we don't go to the stars" - JMS/B5
> "If Java had true garbage collection, most programs would delete
> themselves upon execution." -- Robert Sewell
> _______________________________________________
> freebsd-hardware at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
> To unsubscribe, send any mail to "freebsd-hardware-unsubscribe at freebsd.org"
>


More information about the freebsd-stable mailing list