Too many uncorrectable read errors with atang

Kris Kennaway kris at obsecurity.org
Fri Nov 7 20:48:00 PST 2003


On Fri, Nov 07, 2003 at 08:36:28PM -0800, Andrew P. Lentvorski, Jr. wrote:
> On Fri, 7 Nov 2003, John Baldwin wrote:
> 
> > On 07-Nov-2003 Kris Kennaway wrote:
> > > So far this has happened (well, the panic above was new) on 5 separate
> > > machines that were all working on older -current.  Now, these are all
> > > IBM DeathStar drives, but previously I was only experiencing ata
> > > errors every month or two, and they were correctable for another month
> > > or two by /dev/zero'ing the drive.
> 
> IBM Deathstar's have this annoying tendency to perform thermal
> recalibration cycles that cause them to delay returning data for somewhere
> between 30-90 seconds until the calibration finishes.  Unfortunately,
> these seem to show up as uncorrectable errors.  It's a true pain with RAID
> cards as the RAID array will take the drive offline when it could retry
> the data.
> 
> If you can, try to reduce the temperature of the drives.  This generally
> helped my Deathstars before I got rid of them all.
> 
> Also, given the touchiness of PRML detectors, it is entirely possible that
> the drive is reading increased errors due to the solar flares as a need to
> thermally recalibrate more often.
> 
> Other than tossing the drives, ATAng, like Windows, would have to be more
> aggressive about retrying even uncorrectable errors for up to a minute or
> so before giving up.

Thanks..that's interesting, perhaps there's something sos can do here.
Unfortunately the drives in question are in Yahoo's datacenter, so I
do not have physical access.

Kris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-current/attachments/20031107/82d4c8c1/attachment.bin


More information about the freebsd-current mailing list