ATA tag queuing broken...
ian j hart
ianjhart at ntlworld.com
Fri Apr 25 15:03:26 PDT 2003
On Friday 25 April 2003 6:36 pm, Sean Chittenden wrote:
> > > > > Alright, well it's apparently no surprise to folks that ATA tag
> > > > > queuing is broken at the moment. Are there any objections to me
> > > > > adding a few cautious words to ata(4) and tuning(7) that advise
> > > > > _against_ the use of ata tag queuing given that they're likely the
> > > > > fastest way to reboot a -STABLE box?
> > > > >
> > > > > Here's a PR that I tacked a tad bit of info into:
> > > > >
> > > > > http://www.FreeBSD.org/cgi/query-pr.cgi?pr=kern/42563
> > > >
> > > > That's news to me, works just fine here (4.8-R).
> > >
> > > That's what my box is as well. See the bottom of the PR for details,
> > > but an egrep -r via NFS reboots the box consistently as well as a
> > > local CVSup + nice +20 buildworld.
> > Does it die during the cvs or the buildworld? Buildworld is not very disk
> > intensive. If you nice +20, even more so.
> Buildworld + mild disk load as an NFS server and it does okay. CVSup
> with mild disk load is also okay. But if you toss the three together,
> you're sure to get the box to panic.
Numbers would be better :) systat -vm is probably good enough.
If the box is really that flakey you should be able to panic it with some
other dummy load. I'd bet real money (as high as $5) that NFS panics the box
when the underlying disk *goes away* so I'd thrash the living daylights out
of the disks. I'd also bet that you knew this.
Anyway I'm rather partial to
#ls -R / > /dev/null
and different device combinations of
#dd of=/dev/null bs='63*512' if=/dev/ad0
Anyway I have KDE running, typing this email. The ls in one xterm, dd'ing one
of the raw disks in another xterm, and systat -vm in a third xterm and it's
solid as a rock.
FYI, with just one dd I get 1500tps 45MB/s 97% usage.
> > > > What do you mean by "at the moment"? That pr is six months old.
> > >
> > > Agreed, but since there's no voting for bugs in gnats, I figured I'd
> > > "me too" the PR with an updated time/date and slightly more info.
> > >
> > > > Did you check the list first? I sent another "works for me" less
> > > > than a month ago. (Thread: Status of ATA tagging in Stable Kevin
> > > > Oberman 20030329)
> > >
> > > Yup. It "works" in the sense that under low load, the box works. As
> > > soon as I push it, however, it panics and resets.
> > >
> > > > I note that the pr originator also has the *known to be broken* DTLA
> > > > drives.
> > >
> > > Hrm, well, according to the man pages I've got the right stuff... or
> > > not, I don't remember the qualifications mentioned in tuning(7):
> > >
> > > atapci0: <VIA 8233 ATA100 controller> port 0xdc00-0xdc0f at device 17.1
> > > on pci0 ata0: at 0x1f0 irq 14 on atapci0
> > > ata1: at 0x170 irq 15 on atapci0
> > > ad0: 58644MB <IC35L060AVER07-0> [119150/16/63] at ata0-master tagged
> > > UDMA100 ad2: 58644MB <IC35L060AVER07-0> [119150/16/63] at ata1-master
> > > tagged UDMA100
> > I have very similar hardware. I should be able to reproduce any given
> > disk load. Perhaps we should take this off list and try a few things.
> > Before I go, I should mention that I did have similar "tag" error
> > messages a few weeks ago. I also had a reproducable panic when starting
> > vinum from single user mode. This turned out to be one (or more) of the
> > following.
> > o 1 bad RAM stick
> > o 1 marginal (on spec) RAM stick
> > o Aggressive BIOS settings
> > o Air filters clogged
> > o Unseasonably warm weather (+10F)
> > o Phase of the moon
> Of all of those, it could either be a bios setting or ram, but that's
> if that's a problem. The machine has been running for a year and a
> half and the panics have only been recently (last 6-9mo) or so.
Well that's not an exhaustive list, you'd want to add
o The disks
o The cables (esp length)
o The controller
o The PSU
o everything else
The point I was trying to make was that I didn't immediately post to the list
saying ATA tags are broken (or vinum for that matter). I devoted time, energy
and cash to narrowing down the problem, eliminating alternate causes etc.
From what you've posted the evidence is "anecdotal" and no-one else has come
forward to support it. IMHO that doesn't justify labeling ATA tags as broken.
My copy of man 7 tuning says that this is "new experimental". Isn't that
> > Find a display card and run memtest86 for an hour or so. Take a note
> > of the memory throughput (for BIOS tuning).
> Eh, not so wild about the prospect of the machine dumping given that
> I'm 700+mi away from this particular box, but next time I'm in the
> data center I will... but like I said, I don't think it's hardware.
Not sure what you mean here, re dumping.
So you'll have a serial console setup then? I've not tried it but memtest has
serial console support. Is there a floppy disk and someone on-site with two
ian j hart
Quoth the raven, bite me!
Salem Saberhagen (Episode LXXXI: The Phantom Menace)
More information about the freebsd-stable