ATA tag queuing broken...

ian j hart ianjhart at ntlworld.com
Fri Apr 25 15:03:26 PDT 2003


On Friday 25 April 2003 6:36 pm, Sean Chittenden wrote:
> > > > > Alright, well it's apparently no surprise to folks that ATA tag
> > > > > queuing is broken at the moment.  Are there any objections to me
> > > > > adding a few cautious words to ata(4) and tuning(7) that advise
> > > > > _against_ the use of ata tag queuing given that they're likely the
> > > > > fastest way to reboot a -STABLE box?
> > > > >
> > > > > Here's a PR that I tacked a tad bit of info into:
> > > > >
> > > > > http://www.FreeBSD.org/cgi/query-pr.cgi?pr=kern/42563
> > > >
> > > > That's news to me, works just fine here (4.8-R).
> > >
> > > That's what my box is as well.  See the bottom of the PR for details,
> > > but an egrep -r via NFS reboots the box consistently as well as a
> > > local CVSup + nice +20 buildworld.
> >
> > Does it die during the cvs or the buildworld? Buildworld is not very disk
> > intensive. If you nice +20, even more so.
>
> Buildworld + mild disk load as an NFS server and it does okay.  CVSup
> with mild disk load is also okay.  But if you toss the three together,
> you're sure to get the box to panic.
>

Numbers would be better :) systat -vm is probably good enough.

If the box is really that flakey you should be able to panic it with some 
other dummy load. I'd bet real money (as high as $5) that NFS panics the box 
when the underlying disk *goes away* so I'd thrash the living daylights out 
of the disks. I'd also bet that you knew this.

Anyway I'm rather partial to
#ls -R / > /dev/null

and different device combinations of

#dd of=/dev/null bs='63*512' if=/dev/ad0

Anyway I have KDE running, typing this email. The ls in one xterm, dd'ing one 
of the raw disks in another xterm, and systat -vm in a third xterm and it's 
solid as a rock.

FYI, with just one dd I get 1500tps 45MB/s 97% usage.

> > > > What do you mean by "at the moment"? That pr is six months old.
> > >
> > > Agreed, but since there's no voting for bugs in gnats, I figured I'd
> > > "me too" the PR with an updated time/date and slightly more info.
> > >
> > > > Did you check the list first? I sent another "works for me" less
> > > > than a month ago. (Thread: Status of ATA tagging in Stable Kevin
> > > > Oberman 20030329)
> > >
> > > Yup.  It "works" in the sense that under low load, the box works.  As
> > > soon as I push it, however, it panics and resets.
> > >
> > > > I note that the pr originator also has the *known to be broken* DTLA
> > > > drives.
> > >
> > > Hrm, well, according to the man pages I've got the right stuff... or
> > > not, I don't remember the qualifications mentioned in tuning(7):
> > >
> > > atapci0: <VIA 8233 ATA100 controller> port 0xdc00-0xdc0f at device 17.1
> > > on pci0 ata0: at 0x1f0 irq 14 on atapci0
> > > ata1: at 0x170 irq 15 on atapci0
> > > ad0: 58644MB <IC35L060AVER07-0> [119150/16/63] at ata0-master tagged
> > > UDMA100 ad2: 58644MB <IC35L060AVER07-0> [119150/16/63] at ata1-master
> > > tagged UDMA100
> >
> > I have very similar hardware. I should be able to reproduce any given
> > disk load. Perhaps we should take this off list and try a few things.
> >
> > Before I go, I should mention that I did have similar "tag" error
> > messages a few weeks ago. I also had a reproducable panic when starting
> > vinum from single user mode. This turned out to be one (or more) of the
> > following.
> >
> > o	1 bad RAM stick
> > o	1 marginal (on spec) RAM stick
> > o	Aggressive BIOS settings
> > o	Air filters clogged
> > o	Unseasonably warm weather (+10F)
> > o	Phase of the moon
>
> Of all of those, it could either be a bios setting or ram, but that's
> if that's a problem.  The machine has been running for a year and a
> half and the panics have only been recently (last 6-9mo) or so.

Well that's not an exhaustive list, you'd want to add

o	The disks
o	The cables (esp length)
o	The controller
o	The PSU
o	everything else

The point I was trying to make was that I didn't immediately post to the list 
saying ATA tags are broken (or vinum for that matter). I devoted time, energy 
and cash to narrowing down the problem, eliminating alternate causes etc.

From what you've posted the evidence is "anecdotal" and no-one else has come 
forward to support it. IMHO that doesn't justify labeling ATA tags as broken. 
My copy of man 7 tuning says that this is "new experimental". Isn't that 
enough?

>
> > Find a display card and run memtest86 for an hour or so. Take a note
> > of the memory throughput (for BIOS tuning).
>
> Eh, not so wild about the prospect of the machine dumping given that
> I'm 700+mi away from this particular box, but next time I'm in the
> data center I will... but like I said, I don't think it's hardware.

Not sure what you mean here, re dumping.

So you'll have a serial console setup then? I've not tried it but memtest has 
serial console support. Is there a floppy disk and someone on-site with two 
brain cells?

>
> -sc

-- 
ian j hart

Quoth the raven, bite me!
	Salem Saberhagen (Episode LXXXI: The Phantom Menace)



More information about the freebsd-stable mailing list