Too many uncorrectable read errors with atang

Fri Nov 7 20:36:23 PST 2003

On Fri, 7 Nov 2003, John Baldwin wrote:

> On 07-Nov-2003 Kris Kennaway wrote:
> > So far this has happened (well, the panic above was new) on 5 separate
> > machines that were all working on older -current.  Now, these are all
> > IBM DeathStar drives, but previously I was only experiencing ata
> > errors every month or two, and they were correctable for another month
> > or two by /dev/zero'ing the drive.

IBM Deathstar's have this annoying tendency to perform thermal
recalibration cycles that cause them to delay returning data for somewhere
between 30-90 seconds until the calibration finishes.  Unfortunately,
these seem to show up as uncorrectable errors.  It's a true pain with RAID
cards as the RAID array will take the drive offline when it could retry
the data.

If you can, try to reduce the temperature of the drives.  This generally
helped my Deathstars before I got rid of them all.

Also, given the touchiness of PRML detectors, it is entirely possible that
the drive is reading increased errors due to the solar flares as a need to
thermally recalibrate more often.

Other than tossing the drives, ATAng, like Windows, would have to be more
aggressive about retrying even uncorrectable errors for up to a minute or
so before giving up.

-a