Help debugging DMA_READ errors

Jeremy Chadwick koitsu at FreeBSD.org
Tue Sep 16 17:59:01 UTC 2008


On Tue, Sep 16, 2008 at 10:04:52AM -0700, Clint Olsen wrote:
> Ok, I've had some flakiness with my 6.3-STABLE (Sun May 25 21:55:57 PDT
> 2008) box.  I assume that these errors are indicative of a system-level
> problem rather than a single disk:

Not necessarily, but FreeBSD makes debugging this kind of situation
fairly difficult.  It takes time and a lot of patience.  If the problem
is easily reproducible, that can significantly help.

> Event 1
> -------
> Sep 14 05:12:54 belle kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=216477719
> 
> Result: Hard reset required
>
> Event 2
> -------
> Sep 16 02:11:09 belle kernel: ad4: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=172088735
> Sep 16 02:13:08 belle kernel: acd0: WARNING - READ_TOC taskqueue timeout - completing request directly
> Sep 16 02:13:09 belle kernel: acd0: timeout waiting for ATAPI ready
> Sep 16 02:13:09 belle kernel: acd0: error issuing ATA PACKET command
> Sep 16 02:13:09 belle kernel: acd0: WARNING - READ_TOC freeing taskqueue zombie request
> Sep 16 02:13:09 belle kernel: acd0: timeout waiting for ATAPI ready
> Sep 16 02:13:09 belle kernel: acd0: error issuing ATA PACKET command
> ...last two repeating until reset...
> 
> Result: Hard reset required

The ad4 error looks very similar to your ad0 timeout earlier, just on
a different disk.

acd0 is a CD/DVD drive.  ad4 is a hard disk.  What exactly were you
doing with the system at the time these errors appeared?  Were you using
the CD/DVD drive?  Was there a disc in the drive that was mounted?
If none of these things, I'm baffled as to what would read acd0 and
cause what you see here.

I have a feeling all of these might be driven off of a single
southbridge controller, which could be going bad, or "wedged" in some
way.  You've now seen errors on ad0 (PATA device), ad4 (SATA device),
and acd0 (unknown, but probably a PATA device).

> Disk configuration:
> 
> ad0: 114473MB <WDC WD1200JB-32EVA0 15.05R15> at ata0-master UDMA100
> ad4: 114473MB <WDC WD1200JD-00GBB0 02.05D02> at ata2-master SATA150
> ad6: 476940MB <Seagate ST3500641AS 3.AAJ> at ata3-master SATA150

Can you please provide full details of what these disks are connected
to?  I'd like to see dmesg output for ata0, ata2, and ata3, as well as
the atapci devices those ataX devices are attached to, ditto with
vmstat -i output.  Are there any other errors in your logs around
that time (e.g. watchdog timeouts of any kind on network devices, etc.?)

Additionally, it would be very useful if you could install
ports/sysutils/smartmontools and provide the following output:

# smartctl -a /dev/ad0
# smartctl -a /dev/ad4

This will help in determining if either of the disks saw the DMA errors
reported, and help determine if the disks are going bad, or if your
machine somehow lost power briefly, or imply that you might have a
voltage/PSU problem of some kind.

> I'm using one of those eSATA converter brackets in the back of the machine
> for ad6.  I'm guessing this doesn't have to do with this problem since that
> disk wasn't mentioned.

I can't say for certain.  The above information will help.

> Any advice you can offer will be much appreciated.

The best advice I can give you is the above, combined with the
below Wiki document I've made, time permitting.  It is in no way
complete, and it may simply induce more questions than answers.

http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting

The bottom line is that, if the problems you're seeing are the "same
thing" others are seeing, then you are not alone.  As I said initially,
finding the source of these problems is difficult, and they are often
"unique" to each individual's machine.  For some, replacing cables, the
entire motherboard, disk controller, or just the PSU helped; for others, 
the problem disappeared on its own; in other cases, the problem was
so severe that they ended up switching to Linux.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |



More information about the freebsd-stable mailing list