Serious issue with SATA disks again

Anthony Atkielski atkielski.anthony at wanadoo.fr
Sat Mar 19 01:38:15 PST 2005


I'm still getting errors like this:

ad10: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=5601695
ad10: FAILURE - WRITE_DMA timed out
ad10: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=4848803
ad10: FAILURE - WRITE_DMA timed out
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=5618815
ad10: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=4848959
ad10: FAILURE - WRITE_DMA timed out
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=4472607
ad10: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=4860959
ad10: FAILURE - WRITE_DMA timed out
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=4861087
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=4861695

Yesterday, for the first time, the system crashed (ungracefully) after
some of these errors occurred, and I had to reset the system manually;
fsck had to correct errors after boot.

I need to know what is causing these problems.  They have been reported
for a year by various people on various configurations (different
motherboards and chipsets).  I've seen lots of complaints and reports,
but no solutions.  It's not hardware, so don't bother suggesting that
unless you can _prove_ that the OS is eliminated from consideration.

Doesn't anyone actually know how FreeBSD works?  Someone wrote the code
that prints the above cryptic messages.  What do they mean, _exactly_?

These errors occur most often while I'm running a Perl program (awstats)
to analyse web logs.  That may explain why the LBAs seem to be in the
same region.  ad10 contains /tmp and /var; ad12 (which doesn't seem to
show the error messages) contains /usr.  The root and swap file are on a
different drive entirely.

I'm beginning to get the impression that support for disks is rather
weak in FreeBSD 5.x.  I have mysterious SCSI errors on one machine that
nobody seems to have any clue about, and mysterious SATA errors on
another machine that nobody seems to have any clue about.  I can't
really brag about the reliability or uptime of the OS if it crashes once
a week due to unresolved bugs in disk-handling code.

-- 
Anthony




More information about the freebsd-questions mailing list