Hard disk woes

Jason Morgan jwm-freebsd at sentinelchicken.net
Mon Sep 5 10:19:14 PDT 2005


On Mon, Sep 05, 2005 at 03:16:13PM +0000, Michael Abbott wrote:
> I'm having some very odd behaviour from one of my hard disks and I wonder
> what anybody makes of it.
> 
> In brief, the hard disk in questions works just fine much of the time, but
> when high volume data transfers are requested I get the following in
> /var/log/messages:
> 
> Sep  3 15:21:02 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - 
> resetting
> Sep  3 15:21:02 saturn /kernel: ata3: resetting devices .. done
> Sep  3 15:21:12 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - 
> resetting
> Sep  3 15:21:12 saturn /kernel: ata3: resetting devices .. done
> Sep  3 15:21:23 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - 
> resetting
> Sep  3 15:21:23 saturn /kernel: ata3: resetting devices .. done
> Sep  3 15:21:33 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - 
> resetting
> Sep  3 15:21:33 saturn /kernel: ad6: trying fallback to PIO mode
> Sep  3 15:21:33 saturn /kernel: ata3: resetting devices .. done
> Sep  3 15:21:43 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - 
> resetting
> Sep  3 15:21:43 saturn /kernel: ata3: resetting devices .. ata3-slave: ATA 
> identify retries exceeded
> Sep  3 15:21:43 saturn /kernel: done
> 
> After this point the hard disk in question is frozen until I reboot, and
> any process that tries to touch it is similarly frozen (doesn't even
> respond to kill -9).  `shutdown -r` is enough to restore operation, and
> the rest of the system seemed happy enough.
> 
> Another interesting effect.  I placed a replacement hard disk on the same
> ATA bus (as a slave, device ad7) and tried copying files from ad6 to ad7.
> This time when ad6 froze and the kerned decided to give up on ata3 (and so
> decided to disable ad7 at the same time, naturally enough) the entire
> system froze!  No response from the console, stone cold dead, hard reset
> needed.
> 
> 
> So some questions seem to me to arise from this.
> 
> 1.  Why does FreeBSD handle this so ungracefully?  If restarting is
> sufficient to bring ata3 back then can't the ata driver do a proper
> restart?
> 
> 2.  Goodness me, FreeBSD froze!  I know it's a hardware failure, but
> still: it's on a auxillary ATA controller with no system files attached.
> Is this problem of general interest?  It's certainly a massive hint to me
> not to consider (parallel) ATA for RAID!
> 
> 3.  Any thoughts on what is wrong with the hard disk in question?  I've
> changed ATA controllers, so it seems to be the disk, not the controller.
> The behaviour is very odd.  If I copy files off one at a time, eg using:
>  	find . -type f -exec cp {} "$TARGET/"{} \; -exec echo -n '.' \;
> the disk seems to hang in there, but if I just do
>  	cp -R . "$TARGET"
> then it freezes!  (This statement may not have been thoroughly tested:
> having to restart each time gets old quite quickly.)
> 
> 
> Ok, now for the boring bits.
> 
> $ uname -a
> FreeBSD saturn.araneidae.co.uk 4.11-RELEASE-p11 FreeBSD 4.11-RELEASE-p11 
> #6: Sat Aug 27 16:33:58 GMT 2005     
> root at saturn.araneidae.co.uk:/usr/obj/usr/src/sys/GENERIC  i386
> $ dmesg | grep ata
> atapci0: <HighPoint HPT370 ATA100 controller> port 
> 0xa000-0xa0ff,0x9c00-0x9c03,0x9800-0x9807,0x9400-0x9403,0x9000-0x9007 irq 
> 12 at device 11.0 on pci0
> ata2: at 0x9000 on atapci0
> ata3: at 0x9800 on atapci0
> atapci1: <VIA 8233 ATA133 controller> port 0xa800-0xa80f at device 17.1 on 
> pci0
> ata0: at 0x1f0 irq 14 on atapci1
> ata1: at 0x170 irq 15 on atapci1
> atapci2: <HighPoint HPT372 ATA133 controller> port 
> 0xc400-0xc4ff,0xc000-0xc003,0xbc00-0xbc07,0xb800-0xb803,0xb400-0xb407 irq 
> 10 at device 19.0 on pci0
> ata4: at 0xb400 on atapci2
> ata5: at 0xbc00 on atapci2
> ad0: 39083MB <Maxtor 4D040H2> [79408/16/63] at ata0-master UDMA100
> ad1: 190782MB <SAMSUNG SP2014N> [387621/16/63] at ata0-slave UDMA133
> ad4: 76319MB <ST380021A> [155061/16/63] at ata2-master UDMA100
> ad6: 76319MB <ST380021A> [155061/16/63] at ata3-master UDMA100
> acd0: DVD-ROM <CREATIVEDVD-ROM DVD2240E 12/24/97> at ata1-master PIO4
> $ sudo atacontrol cap ata3 0
> ATA channel 3, Master, device ad6:
> 
> ATA/ATAPI revision    5
> device model          ST380021A
> serial number         3HV0MYL9
> firmware revision     3.10
> cylinders             16383
> heads                 16
> sectors/track         63
> lba supported         156301488 sectors
> lba48 not supported dma supported
> overlap not supported
> 
> Feature                      Support  Enable    Value   Vendor
> write cache                    yes      yes
> read ahead                     yes      yes
> dma queued                     no       no      0/00
> SMART                          yes      no
> microcode download             yes      yes
> security                       yes      no
> power management               yes      yes
> advanced power management      no       no      65278/FEFE
> automatic acoustic management  yes      yes     128/80  128/80
> $
> 
> That's everything I can think of.
> 

Just a general comment:

I had a very similar problem a while back. After replacing the drive in
question, then replacing the motherboard, I discovered it was a power
issue. The power supply was freaking out at medium to high loads, which
was causing the device to continually reset.

Jason


More information about the freebsd-questions mailing list