SATA disks suddenly stop working

Elliot Schlegelmilch elliot+list at schlegelmilch.org
Sat Feb 28 12:06:02 PST 2009


On Thu, Feb 26, 2009 at 12:22:12PM +0100, Gary Jennejohn wrote:
> On Wed, 25 Feb 2009 21:56:38 +0200
> Alexander Motin <mav at FreeBSD.org> wrote:
> 
> > Gary Jennejohn wrote:
> > > I've been having lots of problems with SATA drives attached to higher
> > > port numbers, namely ata5 and ata6.
> > > 
> > > I was installing Linux under qemu today and it had been running for
> > > several hours and had installed multi-gigabytes of data when qemu
> > > just stopped.
> > > 
> > > I noticed that all I/O to the disk had ceased.
> > > 
> > > Doing "atacontrol reinit" on the port (ata5) resulted in a message
> > > that the device was not configured, which was patently false since
> > > qemu had just been merrily writing to it.
> > > 
> > > This with a kernel made from sources updated today at about 2 PM (GMT+1).
> > > 
> > > I've also seen problems with a disk attached to ata6.  It just sort
> > > of disappears after a while.
> > > 
> > > Disks attached to ata2, ata3 and ata4 don't exhibit any problems.
> > 
> > You have told much and same time gave nothing that can be used.
> > 
> 
> I was only interested in whether others have seen this problem.  I was
> not looking for a solution.
> 
> > What controller do you have? What drives on what channels? Is there any 
> > kernel messages about the problem? Have you tried to enable verbose 
> > messages to get additional details?
> > 
> 
> atapci0 at pci0:0:17:0:    class=0x010601 card=0xb0021458 chip=0x43911002 rev=0x00 hdr=0x00
>     vendor     = 'ATI Technologies Inc'
>     class      = mass storage
>     subclass   = SATA
> 
> There were no kernel messages at all, the drive simply hung.
> 
> I'll do a verbose boot and try to reproduce the disk hang later.
> 
> > Reinit could return ENXIO if it already was in progress. Disappearing 
> > drives are also can be related to that reinit. Can't it be just a real 
> > hardware problem?
> > 
> 
> I should have mentioned that the error returned was about some IOCTL.
> Can't remember which one right now, but the error message did include
> that the device was not configured.
> 
> I've also noticed several times in the past when the problem occurred
> that the BIOS could not enumerate the AHCI disks anymore.  I had to
> do a POR.  Seems that the controller was completely hosed such that
> a simple reset didn't reinitialize it sufficiently for it to work.
> 
> This morning I booted the box and started a cvsup.  My repository is
> on a ZFS mirror with the disks on ata3 and ata4.  The system hung after
> the data from the server were received, although all the data were
> successfully written to the disks.
> 
> I couldn't do anything at all - it looked like the root disk was not
> responding and the disk light was on solid red.  I had to do a hard
> reset.
> 
> This is the first time I've seen a problem with this port.  The root
> disk is on ata2.
> 
> I rebooted and turned off MSI.  I'll monitor the situation to see
> whether that helps.

I don't mean to hijack your thread, but I've had problems with one of
my SATA disks falling off the bus.  I could usually retrieve it with
an atacontrol detach / retach.  However, with a recent kernel all I'm
getting is this:

ata2: <ATA channel 0> on atapci1
ata2: AHCI reset...: 2
ata2: SATA connect time=0ms
ata2: ready wait time=0ms52 (12272 MB)
ata2: software reset port 15...
ata2: ahci_issue_cmd timeout: 100 of 100ms, status=00000001
ata2: software reset set timeout
ata2: software reset port 0...
ata2: ahci_issue_cmd timeout: 100 of 100ms, status=00000001
ata2: software reset set timeout
ata2: SIGNATURE: ffffffff
ata2: Unknown signature, assuming disk device
ata2: AHCI reset done: devices=00000001
ata2: [MPSAFE]
ata2: [ITHREAD]

One for each channel, up to ata7. 

atapci0 at pci0:0:31:1:    class=0x01018a card=0x948115d9 chip=0x269e8086 rev=0x09 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '631xESB/632xESB/3100 Ultra ATA Storage Controller'
    class      = mass storage
    subclass   = ATA

The last known kernel which works was Dec 17, but trying to rebuild a
kernel from that date doesn't see the SATA disks either (as the kernel
which sees the disks zfs doesn't work.) Or perhaps I'm csup'ing
incorrectly. I'm still trying to back up far enough so it will work.


More information about the freebsd-current mailing list