SATA disks suddenly stop working
Alexander Motin
mav at mavhome.dp.ua
Sun Mar 1 07:58:35 PST 2009
Elliot Schlegelmilch wrote:
> On Thu, Feb 26, 2009 at 12:22:12PM +0100, Gary Jennejohn wrote:
>> On Wed, 25 Feb 2009 21:56:38 +0200
>> Alexander Motin <mav at FreeBSD.org> wrote:
>>
>>> Gary Jennejohn wrote:
>>>> I've been having lots of problems with SATA drives attached to higher
>>>> port numbers, namely ata5 and ata6.
>>>>
>>>> I was installing Linux under qemu today and it had been running for
>>>> several hours and had installed multi-gigabytes of data when qemu
>>>> just stopped.
>>>>
>>>> I noticed that all I/O to the disk had ceased.
>>>>
>>>> Doing "atacontrol reinit" on the port (ata5) resulted in a message
>>>> that the device was not configured, which was patently false since
>>>> qemu had just been merrily writing to it.
>>>>
>>>> This with a kernel made from sources updated today at about 2 PM (GMT+1).
>>>>
>>>> I've also seen problems with a disk attached to ata6. It just sort
>>>> of disappears after a while.
>>>>
>>>> Disks attached to ata2, ata3 and ata4 don't exhibit any problems.
>>> You have told much and same time gave nothing that can be used.
>>>
>> I was only interested in whether others have seen this problem. I was
>> not looking for a solution.
>>
>>> What controller do you have? What drives on what channels? Is there any
>>> kernel messages about the problem? Have you tried to enable verbose
>>> messages to get additional details?
>>>
>> atapci0 at pci0:0:17:0: class=0x010601 card=0xb0021458 chip=0x43911002 rev=0x00 hdr=0x00
>> vendor = 'ATI Technologies Inc'
>> class = mass storage
>> subclass = SATA
>>
>> There were no kernel messages at all, the drive simply hung.
>>
>> I'll do a verbose boot and try to reproduce the disk hang later.
>>
>>> Reinit could return ENXIO if it already was in progress. Disappearing
>>> drives are also can be related to that reinit. Can't it be just a real
>>> hardware problem?
>>>
>> I should have mentioned that the error returned was about some IOCTL.
>> Can't remember which one right now, but the error message did include
>> that the device was not configured.
>>
>> I've also noticed several times in the past when the problem occurred
>> that the BIOS could not enumerate the AHCI disks anymore. I had to
>> do a POR. Seems that the controller was completely hosed such that
>> a simple reset didn't reinitialize it sufficiently for it to work.
>>
>> This morning I booted the box and started a cvsup. My repository is
>> on a ZFS mirror with the disks on ata3 and ata4. The system hung after
>> the data from the server were received, although all the data were
>> successfully written to the disks.
>>
>> I couldn't do anything at all - it looked like the root disk was not
>> responding and the disk light was on solid red. I had to do a hard
>> reset.
>>
>> This is the first time I've seen a problem with this port. The root
>> disk is on ata2.
>>
>> I rebooted and turned off MSI. I'll monitor the situation to see
>> whether that helps.
>
> I don't mean to hijack your thread, but I've had problems with one of
> my SATA disks falling off the bus. I could usually retrieve it with
> an atacontrol detach / retach. However, with a recent kernel all I'm
> getting is this:
>
> ata2: <ATA channel 0> on atapci1
> ata2: AHCI reset...: 2
> ata2: SATA connect time=0ms
> ata2: ready wait time=0ms52 (12272 MB)
> ata2: software reset port 15...
> ata2: ahci_issue_cmd timeout: 100 of 100ms, status=00000001
> ata2: software reset set timeout
> ata2: software reset port 0...
> ata2: ahci_issue_cmd timeout: 100 of 100ms, status=00000001
> ata2: software reset set timeout
> ata2: SIGNATURE: ffffffff
> ata2: Unknown signature, assuming disk device
> ata2: AHCI reset done: devices=00000001
> ata2: [MPSAFE]
> ata2: [ITHREAD]
>
> One for each channel, up to ata7.
Does it happen during boot or what do you mean by unable to reattach
drive now?
> atapci0 at pci0:0:31:1: class=0x01018a card=0x948115d9 chip=0x269e8086 rev=0x09 hdr=0x00
> vendor = 'Intel Corporation'
> device = '631xESB/632xESB/3100 Ultra ATA Storage Controller'
> class = mass storage
> subclass = ATA
>
> The last known kernel which works was Dec 17, but trying to rebuild a
> kernel from that date doesn't see the SATA disks either (as the kernel
> which sees the disks zfs doesn't work.) Or perhaps I'm csup'ing
> incorrectly.
Haven't you tried to just boot previous kernel from kernel.old
directory? Or you have already overwritten it with
> I'm still trying to back up far enough so it will work.
Feb 14 should be fine. I have touched reset sequence on 15.
When you succeed to boot, can you try to make some experiments against
HEAD, may be some of them fix the problem:
1) comment that line inside ata_ahci_issue_cmd():
ATA_OUTL(ctlr->r_res2, ATA_AHCI_P_FBS + offset, (port << 8) |
0x00000001);
2) comment these lines inside ata_sata_phy_reset():
if ((ATA_IDX_INL(ch, ATA_SCONTROL) & ATA_SC_DET_MASK) ==
ATA_SC_DET_IDLE)
return ata_sata_connect(ch);
3) comment first that line inside ata_ahci_softreset():
return (-1);
Thanks.
--
Alexander Motin
More information about the freebsd-current
mailing list