Bad disk or kernel (ATA Driver) problem?
jsimola at gmail.com
Wed Jan 19 15:06:05 PST 2005
On Wed, 19 Jan 2005 13:33:12 -0800, Jon Simola <jsimola at gmail.com> wrote:
> I've got a few 1U Supermicro boxes running dual SATA drives:
> I've run into all sorts of problems with every one, and changing the
> IDE channel settings in the BIOS always fixes it. Which really annoys
> me, because I setup a new box, run it for a couple weeks, then the
> drives start getting flaky under load. Then I go change the setting in
> the BIOS (that I always forget to do on initial setup) and it's dead
> stable for months at a time.
I was politely asked to actually dig up the settings, which cut
through my lack of sleep. I should have done this earlier :)
On this one box (Supermicro SuperServer 5013C-T, P4SCE BIOS v1.2c):
atapci0: <Intel ICH5 SATA150 controller> port 0xf000-0xf00f,0-0x3,0-0x7,0-0x3,0-
0x7 irq 16 at device 31.2 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
GEOM: create disk ad0 dp=0xc671a560
ad0: 70911MB <WDC WD740GD-00FLA0> [144073/16/63] at ata0-master UDMA100
GEOM: create disk ad1 dp=0xc671a460
ad1: 70911MB <WDC WD740GD-00FLA0> [144073/16/63] at ata0-slave UDMA100
acd0: CDROM <CD-224E> at ata1-master PIO4
That's a pair of SATA 74GB WD Raptors. The BIOS IDE setting is for
"Combined" - SATA drives will appear on the Primary IDE channel.
On a different box (Supermicro SuperServer 5013C-T, P4SCE BIOS v1.2c):
atapci0: <Intel ICH5 UDMA100 controller> port
0xf000-0xf00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
atapci1: <Intel ICH5 SATA150 controller> port
irq 18 at device 31.2 on pci0
ata2: channel #0 on atapci1
ata3: channel #1 on atapci1
acd0: CDROM <CD-224E/1.9A> at ata1-master UDMA33
ad4: 78167MB <Maxtor 6Y080M0/YAR51HW0> [158816/16/63] at ata2-master SATA150
ad6: 78167MB <Maxtor 6Y080M0/YAR51HW0> [158816/16/63] at ata3-master SATA150
A pair of Maxtor 80GBs, the BIOS is set for "Enhanced", up to 6 drives
(4 IDE + 2 SATA).
Crazy as though it seems, I wasn't kidding about changing the BIOS.
The other 2 settings are "SATA only" and "Auto". When the drives
started flaking out (timeouts on reads) I would go into the BIOS and
cycle through the BIOS settings. After changing it once or twice,
things would be fine for months at a time.
My best suspicion is that "something" makes the ICH5 a little flaky,
and twiddling the BIOS clears it somehow. My only evidence supporting
that is that twice the bios stalled on probing the drives once this
error had happened, and I had to physically remove the drives, twiddle
the bios settings, and replace the drives before it would work again.
On OpenBSD, this problem on the same hardware manifests as a read
timeout failure during the initial boot probes. Same fix, play with
the BIOS and it suddenly works. There's a term in the Jargon file for
this, but I can't recall it at the moment.
More information about the freebsd-stable