problems with AHCI on FreeBSD 8.2

Scott Long scottl at
Wed Feb 15 00:15:14 UTC 2012

On Feb 14, 2012, at 4:34 PM, Victor Balada Diaz wrote:

> On Tue, Feb 14, 2012 at 03:09:58PM -0800, Jeremy Chadwick wrote:
>> On Tue, Feb 14, 2012 at 11:15:27PM +0100, Victor Balada Diaz wrote:
>>> On Tue, Feb 14, 2012 at 06:17:19PM +0100, Harald Schmalzbauer wrote:
>>>> schrieb Jeremy Chadwick am 14.02.2012 17:50 (localtime):
>>>>> On Tue, Feb 14, 2012 at 04:55:10PM +0100, Claudius Herder wrote:
>>>>>> Hello,
>>>>>> I have got a quite similar problem with AHCI on FreeBSD 8.2 and it still
>>>>>> persists on FreeBSD 9.0 release.
>>>>>> Switching from ahci to ataahci resolved the problem for me too.
>>>>>> I'm using gmirror for swap, system is on a zpool and the problem first
>>>>>> occurred during a zpool scrub, but it is easily reproducible with dd.
>>>>>> The timeouts only occur when writing to disks, dd if=/dev/ada{0|1}
>>>>>> of=/dev/null is not an issue.
>>>>>> Sometimes I need to power off the server because after a reboot one disk
>>>>>> is still missing.
>>>>>> I really would like to help in this issue, so let me know if you need
>>>>>> any more information.
>>>>> I find it interesting that, at least so far, the only people reporting
>>>>> problems of this type with the ahci.ko driver are people using Samsung
>>>>> disks.  The only difference is that your models are F1s while the OPs
>>>>> are F2s.
>>>> I saw such timeouts long ago and mav@ had a look at my postings and he
>>>> mentioned it could be a NCQ problem.
>>>> I suspected the disks firmware.
>>>> I never tracked it down further, because after replacing the Samsung (F3
>>>> in that case) disks with hitachi ones solved all my problems and gave a
>>>> big performance kick as well (with zfs).
>>>> You can find the discussion here:
>>> You gave me a good idea: try to disable NCQ and see if that's the fault. So
>>> i went and applied the attached patch. After it, i can no longer reproduce
>>> the issue with ahci driver.
>>> I know this is not a solution because it disables NCQ at controller level
>>> instead of disk level, but at least we know for sure where the problem is.
>>> I think the solution would be to add a new quirk ADA_Q_NONCQ in sys/cam/ata/ata_da.c.
>>> Quirks infraestructure is already built, so adding a new quirk for this seems
>>> easy.
>>> Is someone interested? Do you think there is a better solution?
>>> If someone is interested i can build a patch to add ADA_Q_NONCQ quirk and add my drives
>>> to it.
>> I took a stab at this, but I don't feel confident this is the proper
>> solution/method.  I worry there's some sort of chicken-or-the-egg
>> condition here (quirk setup/matching comes *after* SATA capabilities
>> detection), or that it makes the code messier.  Need mav@'s
>> recommendations on this.
>> Below is for RELENG_8.  I should note I haven't tested if this works, or
>> even compiles -- normally I don't provide such patches without testing
>> so I apologise in advance / user beware.
> You're amazingly fast. Thanks for all your help :)
> You start applying the quirks before 
>        snprintf(announce_buf, sizeof(announce_buf),
>            "", periph->unit_number);
>        quirks = softc->quirks;
>        TUNABLE_INT_FETCH(announce_buf, &quirks);
> So you're breaking quirk setting at boot time.
> See my attached patch. I can confirm it works for me.
> Regards.

I don't think that disabling NCQ entirely is the right solution.  It's a tag starvation issue in the firmware, not a complete failure, and it can be dealt with in the CAM XPT scheduler fairly efficiently.  Alexander and I talked about this recently, and though we differ on the details, a tag hack is not in order, IMHO.  In the short term, try just using "cam control tags ada0 -N 1" to limit the concurrent commands to 1.


More information about the freebsd-stable mailing list