9.1-stable: ATI IXP600 AHCI: CAM timeout
Oliver Fromme
olli at lurza.secnetix.de
Wed May 29 18:44:13 UTC 2013
Ian Lepore wrote:
> On Wed, 2013-05-29 at 16:21 +0200, Oliver Fromme wrote:
> > Steven Hartland wrote:
> > > Have you checked your sata cables and psu outputs?
> > >
> > > Both of these could be the underlying cause of poor signalling.
> >
> > I can't easily check that because it is a cheap rented
> > server in a remote location.
> >
> > But I don't believe it is bad cabling or PSU anyway, or
> > otherwise the problem would occur intermittently all the
> > time if the load on the disks is sufficiently high.
> > But it only occurs at tags=3 and above. At tags=2 it does
> > not occur at all, no matter how hard I hammer on the disks.
> >
> > At the moment I'm inclined to believe that it is either
> > a bug in the HDD firmware or in the controller. The disks
> > aren't exactly new, they're 400 GB Samsung ones that are
> > several years old. I think it's not uncommon to have bugs
> > in the NCQ implementation in such disks.
> >
> > The only thing that puzzles me is the fact that the problem
> > also disappears completely when I reduce the SATA rev from
> > II to I, even at tags=32.
>
> It seems to me that you dismiss signaling problems too quickly.
> Consider the possibilities... A bad cable leads to intermittant errors
> at higher speeds. When NCQ is disabled or limited the software handles
> these errors pretty much transparently. When NCQ is not limitted and
> there are many outstanding requests, suddenly the error handling in the
> software breaks down somehow and a minor recoverable problem becomes an
> in-your-face error.
>
> I'm not saying any of the foregoing is true, just that you should
> consider the possibility that you're dealing with multiple problems
> which are only loosely coupled, but together can seem like a single more
> serious problem. You don't know enough yet to casually dismiss
> anything.
Well ... I also can't dismiss the possibility that there is
a mouse in the machine that is pulling the SATA cables twice
every minute. :-)
But seriously ... I don't see how bad cabling could cause
errors at tags=3 and no errors at all at tags=2. It shouldn't
make a difference for the cables if there are two or three
tags used. And by the way, it doesn't make a difference at
all whether I use tags=3 or tags=32; the rate of errors is the
same in both cases (about two per minute during buildword).
I have googled a bit; the Samsung HD401LJ and HD403LJ don't
seem to be innocent ... There are lots of pages mentioning
problems with NCQ and SATA I vs. II.
Best regards
Oliver
--
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Handelsregister: Amtsgericht Muenchen, HRA 74606, Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsreg.: Amtsgericht München,
HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart
FreeBSD-Dienstleistungen/-Produkte + mehr: http://www.secnetix.de/bsd
"A misleading benchmark test can accomplish in minutes
what years of good engineering can never do." -- Dilbert (2009-03-02)
More information about the freebsd-stable
mailing list