ATA_DMA errors (and fs corruption!)

Tony Byrne freebsd-current at byrnehq.com
Mon Jun 20 10:09:48 GMT 2005


Hello twesky,

t> atapci0: <Intel ICH4 UDMA100 controller> port
t> 0x1860-0x186f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on
t> pci0
t> ata0: channel #0 on atapci0
t> ata1: channel #1 on atapci0

t> The last known good stable version for me was aprox April 25, my next
t> cvsup was May 17, but I have problems with 5.4 Release so I assume
t> (probably incorrectly) that something changed between April 25 and
t> 5.4R.

t> I don't exactly recall my shutdown errors, but I did have to restore
t> my file systems to get my laptop back to a functioning state.

We've been seeing the same problem in a server equipped with an Intel
ICH5 controller and SATA Hard Disk. The problems seemed to start after
an update in mid-May. We noticed that processes such as our imap
server would stall for a few seconds and the console would indicate
either a READ_DMA or WRITE_DMA timeout.  On two occasions the the disk
became detached requiring a reboot.  The frequency of these timeouts
were such that we couldn't do any work with the server.

We didn't have this problem prior to the update. We are tracking
RELENG_5, but have now reverted to a May 9th kernel, which doesn't
seem to be quite so fussy and has reduced the problem to a handful of
timeouts every day.

What's bugging me is that this list has been very quiet about this
problem. The Intel ICH* controllers must be common in the field and
I'm surprised that this problem has gone unnoticed. Of course, there
can be hardware reasons for timeouts such as a dying disk or cable,
but I think we've eliminated these in our case. The disk works fine
when transferred to another machine and the SATA cable works fine when
used with another disk (albeit one of smaller capacity) in the server.
So we've come to the conclusion that it's the combination of
controller, disk and FreeBSD version that holds the key to this.

Jun 20 10:20:04 roo kernel: atapci0: <Intel ICH5 SATA150 controller> port 0xffa0-0xffaf,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.2 on pci0
Jun 20 10:20:04 roo kernel: ata0: channel #0 on atapci0
Jun 20 10:20:04 roo kernel: ata1: channel #1 on atapci0

...

Jun 20 10:20:04 roo kernel: ad0: 190782MB <WDC WD2000JD-00FYB0/02.05D02> [387621/16/63] at ata0-master SATA150
Jun 20 10:20:04 roo kernel: acd0: CDROM <SAMSUNG CD-ROM SC-152G/C400> at ata1-master PIO4

...



Regards,

Tony.

-- 
Tony Byrne




More information about the freebsd-stable mailing list