problems with AHCI on FreeBSD 8.2

Jeremy Chadwick freebsd at jdc.parodius.com
Wed Feb 15 19:19:34 UTC 2012


On Wed, Feb 15, 2012 at 07:17:57PM +0100, Victor Balada Diaz wrote:
> On Tue, Feb 14, 2012 at 06:16:01AM -0800, Jeremy Chadwick wrote:
> > Thanks.  Both your drives look overall fine, sort-of.  I'll outline my
> > concern points, and ask for some more info:
> > 
> > * ada0 has 28 CRC errors, while ada1 has 2.  These drives have been in
> > use for 4688 hours and 4583 hours (respectively), which is roughly 6
> > months for each drive.  CRC errors usually result in transparent
> > retransmits, but this can sometimes cause I/O delays (especially if the
> > CRC errors are repeated).
> > 
> > If the timeout messages recur in the future, please run the commands I
> > gave you above once more and provide the output.  I can then compare the
> > old to the new and see if there is anything of interest.
> 
> I've made it fail again. You can see smartctl -a output. CRC errors are increasing.
> But i'm not sure what does it really mean. Is HD broken? both? at the same time?

CRC errors indicate one of the following, in no particular order:

* Physical cabling problems (number of reasons/possibilities here are
  too many to list)
* Dirty/dusty SATA connectors (cables/drive/host controller)
* Electrical interference (badly shielded cables, etc.)
* Physical electronic/electrical problems (disk PCB, host controller
  PCB, etc.)

The important thing to remember about CRCs is that they indicate a
hardware-level problem between the host controller and the controller
chip on the drive.  They do not indicate problems with the drive's cache
(those are tracked in attribute 184), and they do not indicate
software-level problems (e.g. driver bugs, etc.).

I have no real advice for tracking this kind of problem down.  The most
common response is "replace cables", which isn't necessarily the root
cause.  I have no advice or tips on how to track down interference
issues, or how to truly examine a disk PCB or controller PCB for the
latter item.  "Flaky traces" on a PCB could cause this sort of thing.
Folks in the EE field would know more about these issues; I am not an EE
person.

Since the attribute increased on both drives simultaneously (I have to
assume simultaneously?), it's more likely that the problem is not with
SATA cables or the drives but the controller on the motherboard.  I'd
recommend replacing the motherboard.  I make no guarantees this will fix
anything however, but it is the "common point" for both of your drives.

There really isn't anything else I can do going forward.  This is pretty
much where the buck stops for me, and is validation as to why each and
every problem/issue has to be handled individually.

-- 
| Jeremy Chadwick                              jdc at parodius.com |
| Parodius Networking                     http://www.parodius.com/ |
| UNIX Systems Administrator                 Mountain View, CA, US |
| Making life hard for others since 1977.             PGP 4BD6C0CB |



More information about the freebsd-stable mailing list