Hard disk woes

Michael Abbott michael at araneidae.co.uk
Mon Sep 5 08:16:22 PDT 2005


I'm having some very odd behaviour from one of my hard disks and I wonder
what anybody makes of it.

In brief, the hard disk in questions works just fine much of the time, but
when high volume data transfers are requested I get the following in
/var/log/messages:

Sep  3 15:21:02 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting
Sep  3 15:21:02 saturn /kernel: ata3: resetting devices .. done
Sep  3 15:21:12 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting
Sep  3 15:21:12 saturn /kernel: ata3: resetting devices .. done
Sep  3 15:21:23 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting
Sep  3 15:21:23 saturn /kernel: ata3: resetting devices .. done
Sep  3 15:21:33 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting
Sep  3 15:21:33 saturn /kernel: ad6: trying fallback to PIO mode
Sep  3 15:21:33 saturn /kernel: ata3: resetting devices .. done
Sep  3 15:21:43 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting
Sep  3 15:21:43 saturn /kernel: ata3: resetting devices .. ata3-slave: ATA identify retries exceeded
Sep  3 15:21:43 saturn /kernel: done

After this point the hard disk in question is frozen until I reboot, and
any process that tries to touch it is similarly frozen (doesn't even
respond to kill -9).  `shutdown -r` is enough to restore operation, and
the rest of the system seemed happy enough.

Another interesting effect.  I placed a replacement hard disk on the same
ATA bus (as a slave, device ad7) and tried copying files from ad6 to ad7.
This time when ad6 froze and the kerned decided to give up on ata3 (and so
decided to disable ad7 at the same time, naturally enough) the entire
system froze!  No response from the console, stone cold dead, hard reset
needed.


So some questions seem to me to arise from this.

1.  Why does FreeBSD handle this so ungracefully?  If restarting is
sufficient to bring ata3 back then can't the ata driver do a proper
restart?

2.  Goodness me, FreeBSD froze!  I know it's a hardware failure, but
still: it's on a auxillary ATA controller with no system files attached.
Is this problem of general interest?  It's certainly a massive hint to me
not to consider (parallel) ATA for RAID!

3.  Any thoughts on what is wrong with the hard disk in question?  I've
changed ATA controllers, so it seems to be the disk, not the controller.
The behaviour is very odd.  If I copy files off one at a time, eg using:
  	find . -type f -exec cp {} "$TARGET/"{} \; -exec echo -n '.' \;
the disk seems to hang in there, but if I just do
  	cp -R . "$TARGET"
then it freezes!  (This statement may not have been thoroughly tested:
having to restart each time gets old quite quickly.)


Ok, now for the boring bits.

$ uname -a
FreeBSD saturn.araneidae.co.uk 4.11-RELEASE-p11 FreeBSD 4.11-RELEASE-p11 #6: Sat Aug 27 16:33:58 GMT 2005     root at saturn.araneidae.co.uk:/usr/obj/usr/src/sys/GENERIC  i386
$ dmesg | grep ata
atapci0: <HighPoint HPT370 ATA100 controller> port 0xa000-0xa0ff,0x9c00-0x9c03,0x9800-0x9807,0x9400-0x9403,0x9000-0x9007 irq 12 at device 11.0 on pci0
ata2: at 0x9000 on atapci0
ata3: at 0x9800 on atapci0
atapci1: <VIA 8233 ATA133 controller> port 0xa800-0xa80f at device 17.1 on pci0
ata0: at 0x1f0 irq 14 on atapci1
ata1: at 0x170 irq 15 on atapci1
atapci2: <HighPoint HPT372 ATA133 controller> port 0xc400-0xc4ff,0xc000-0xc003,0xbc00-0xbc07,0xb800-0xb803,0xb400-0xb407 irq 10 at device 19.0 on pci0
ata4: at 0xb400 on atapci2
ata5: at 0xbc00 on atapci2
ad0: 39083MB <Maxtor 4D040H2> [79408/16/63] at ata0-master UDMA100
ad1: 190782MB <SAMSUNG SP2014N> [387621/16/63] at ata0-slave UDMA133
ad4: 76319MB <ST380021A> [155061/16/63] at ata2-master UDMA100
ad6: 76319MB <ST380021A> [155061/16/63] at ata3-master UDMA100
acd0: DVD-ROM <CREATIVEDVD-ROM DVD2240E 12/24/97> at ata1-master PIO4
$ sudo atacontrol cap ata3 0
ATA channel 3, Master, device ad6:

ATA/ATAPI revision    5
device model          ST380021A
serial number         3HV0MYL9
firmware revision     3.10
cylinders             16383
heads                 16
sectors/track         63
lba supported         156301488 sectors
lba48 not supported dma supported
overlap not supported

Feature                      Support  Enable    Value   Vendor
write cache                    yes      yes
read ahead                     yes      yes
dma queued                     no       no      0/00
SMART                          yes      no
microcode download             yes      yes
security                       yes      no
power management               yes      yes
advanced power management      no       no      65278/FEFE
automatic acoustic management  yes      yes     128/80  128/80
$

That's everything I can think of.



More information about the freebsd-questions mailing list