Another twist on WRITE_DMA issues...

Garance A Drosihn drosih at rpi.edu
Thu Dec 2 18:31:41 PST 2004


In a different thread, I (garance) wrote:
>
>At 10:30 PM +0100 11/18/04, Søren Schmidt wrote:
>>Garance A Drosihn wrote:
>>
>>>I am trying to pin down problems "FAILURE - WRITE_DMA timed out"
>>>in a new PC that I have.  I had some local shop build this for me,
>>>and apparently there were "a few" miscommunications in what I
>>>thought I asked for, and what they actually built.
>>>
>>>The machine ended up with two SATA controllers:
>>>    atapci0: <SiI 3112 SATA150 controller> -- on the motherboard
>>>    atapci1: <VIA 6420 SATA150 controller> -- on a PCI card
>>
>>I think its the other way around, the VIA chip is part of the
>>motherboard chipset, the SiI is a "loose" PCI compatible chip.
>
>Ugh.  You are correct.  Somewhere along the line I got the two
>mixed up.  So now have I removed the PCI-based SATA card, and
>connected the Western Digital hard disk to the on-board SATA.
>I have just done a complete buildworld/installworld cycle for
>5.3-STABLE.  I did not see a single WRITE_DMA time-out message.

So far so good.

>But looking around the web for awhile, it looks like this model of
>Western Digital is not a native SATA drive.  So I think I will
>replace it just to avoid any further hassles, even though I did not
>get any errors with this drive once I was using the right controller.

I have now switched from that Western Digital drive to a Seagate
Barracuda 7200.7 120-gig (ST3120026AS).  The drive seems to be
working fairly well, but now I sometimes see some combination
like the following three lines:

Dec  2 20:29:50 kernel: Interrupt storm detected on
                         "irq20: atapci0"; throttling interrupt source
Dec  2 20:29:54 kernel: ad4: TIMEOUT - WRITE_DMA retrying
                         (2 retries left) LBA=20627679
Dec  2 20:29:54 kernel: ad4: FAILURE - WRITE_DMA timed out

Where atapci0: <VIA 6420 SATA150 controller>
And
ad4: 114473MB <ST3120026AS/3.56> [232581/16/63] at ata2-master SATA150

This does not come up often, and it usually doesn't cause any
noticeable problem.  As it luck would have it, the one time it has
caused problems is during installworlds.  I just did 18 buildworlds
in a row without any problem.  I built and installed the new kernel,
rebooted into single-user, and the system paniced early in the
installworld.  I rebooted into single-user again, and this time it
was *almost* finished with installworld when the system simply hung
after a "ad4: FAILURE - WRITE_DMA timed out" message.

Now I'm back up in multi-user mode, and I just completed another
buildworld without any problem.  I did get the above set of messages,
but nothing after that.  (I did see several sets of WRITE_DMA error
messages during the installworlds).

This is on a recent snapshot of 5.3-stable.  Should I just switch
back to the western digital?  Or is it that the new disk is fast
enough that the kernel *thinks* something is wrong with it, and
starts throttling it?  Or maybe I have a bad SATA cable?  If it
wasn't for the panics/hangs during installworld, I would think that
everything was working quite well.  Of course, that is about the
worst time to be getting system panics!

I tried getting a core dump of the panic, but 'call doadump' complained
that no dump device had been set.  I'm now looking at /etc/rc.d/dumpon
so I should know how to set that up the next time I'm in single-user
mode.

-- 
Garance Alistair Drosehn            =   gad at gilead.netel.rpi.edu
Senior Systems Programmer           or  gad at freebsd.org
Rensselaer Polytechnic Institute    or  drosih at rpi.edu


More information about the freebsd-current mailing list