KDying disk

Tue Nov 22 20:14:40 PST 2005

On Wed, Nov 23, 2005 at 10:33:17 +0700, Olivier Nicole wrote:
> > > Nov 22 17:42:50 ufo kernel: Copied 18 bytes of sense data offset 12: 0xf1 0x0 0x3 0x0 0x51 0x3b 0x9f 0xa 0x0 0x0 0x0 0x0 0xc 0x0 0xd3 0x80 0x0 0x18
> > > Nov 22 17:42:50 ufo kernel: (da0:ahd1:0:0:0): WRITE(10). CDB: 2a 0 0 51 ac 7f 0 0 4 0 
> > > Nov 22 17:42:50 ufo kernel: (da0:ahd1:0:0:0): CAM Status: SCSI Status Error
> > > Nov 22 17:42:50 ufo kernel: (da0:ahd1:0:0:0): SCSI Status: Check Condition
> > > Nov 22 17:42:50 ufo kernel: (da0:ahd1:0:0:0): Deferred Error: MEDIUM ERROR info:513b9f asc:c,0
> > > Nov 22 17:42:50 ufo kernel: (da0:ahd1:0:0:0): Write error field replaceable unit: d3 actual retry count: 24
> > > Nov 22 17:42:50 ufo kernel: (da0:ahd1:0:0:0): Retrying Command (per Sense Data)
> > > N
> > 
> > This means that you're starting to get failing sectors.  The disk can 
> > automatically remap them if you are writing to them, but can't do 
> 
> Is there a way to tell what sector(s) is bad/what partition is
> concerned?

The sector in question is specified in the info: field in the sense
information.  In this case, it's sector 0x513b9f.  Note that since this is
a deferred error, it generally won't have anything to do with the write
that apparantly failed.  (i.e. pay no attention to the CDB for the command,
it's really a different command that failed)

This error happens when you have write caching turned on, and the drive has
already acknowledged the write with good status, but then later has
problems when it actually tries to write the data to that sector on the
disk.  So it remaps that sector to one of its spare sectors, and informs
you by throwing a deferred error on the next write command it sees.

> I did a physical disk verify (from Adaptec BIOS at reboot) that
> detected nothing bad.

You'll probably see this block in the grown defect list.  A verify probably
wouldn't find a problem with that sector, since it was remapped.  To see
the grown defect list, try this:

camcontrol defects da0 -f phys -G

It'll be difficult to map the output of the physical block description
(cylinder, head and sector) back to the LBA you use when talking to the
disk most of the time.  A few disks (not many) support the block format, so
you could use -f block instead and see if that works.

> I am afraid that the error message will not be an evidence good
> enought to claim for waranty :(

Some drive manufacturers have a disk verification utility that will
generate pass/fail status for a drive.  Sometimes they'll require output
from that utility in order to take a drive back for warranty replacement.
See if your drive manufacturer has such a utility and see what it says.
(It may be a DOS or Windows utility, though...)

This sort of thing is why I'd recommend some sort of RAID setup for most
machines now.  The cost of a disk failure in time, inconvenience and
replacing the data often outweighs the cost of buying an extra disk or
disks and setting up RAID (not RAID-0 :) in software or hardware.

Ken
-- 
Kenneth Merry
ken at kdm.org