ATA tag queuing broken...

Sean Chittenden sean at chittenden.org
Fri Apr 25 10:36:26 PDT 2003


> > > > Alright, well it's apparently no surprise to folks that ATA tag
> > > > queuing is broken at the moment.  Are there any objections to me
> > > > adding a few cautious words to ata(4) and tuning(7) that advise
> > > > _against_ the use of ata tag queuing given that they're likely the
> > > > fastest way to reboot a -STABLE box?
> > > >
> > > > Here's a PR that I tacked a tad bit of info into:
> > > >
> > > > http://www.FreeBSD.org/cgi/query-pr.cgi?pr=kern/42563
> > >
> > > That's news to me, works just fine here (4.8-R).
> >
> > That's what my box is as well.  See the bottom of the PR for details,
> > but an egrep -r via NFS reboots the box consistently as well as a
> > local CVSup + nice +20 buildworld.
> 
> Does it die during the cvs or the buildworld? Buildworld is not very disk 
> intensive. If you nice +20, even more so.

Buildworld + mild disk load as an NFS server and it does okay.  CVSup
with mild disk load is also okay.  But if you toss the three together,
you're sure to get the box to panic.

> > > What do you mean by "at the moment"? That pr is six months old.
> >
> > Agreed, but since there's no voting for bugs in gnats, I figured I'd
> > "me too" the PR with an updated time/date and slightly more info.
> >
> > > Did you check the list first? I sent another "works for me" less
> > > than a month ago. (Thread: Status of ATA tagging in Stable Kevin
> > > Oberman 20030329)
> >
> > Yup.  It "works" in the sense that under low load, the box works.  As
> > soon as I push it, however, it panics and resets.
> >
> > > I note that the pr originator also has the *known to be broken* DTLA
> > > drives.
> >
> > Hrm, well, according to the man pages I've got the right stuff... or
> > not, I don't remember the qualifications mentioned in tuning(7):
> >
> > atapci0: <VIA 8233 ATA100 controller> port 0xdc00-0xdc0f at device 17.1 on
> > pci0 ata0: at 0x1f0 irq 14 on atapci0
> > ata1: at 0x170 irq 15 on atapci0
> > ad0: 58644MB <IC35L060AVER07-0> [119150/16/63] at ata0-master tagged
> > UDMA100 ad2: 58644MB <IC35L060AVER07-0> [119150/16/63] at ata1-master
> > tagged UDMA100
> 
> I have very similar hardware. I should be able to reproduce any given disk 
> load. Perhaps we should take this off list and try a few things.
> 
> Before I go, I should mention that I did have similar "tag" error messages a 
> few weeks ago. I also had a reproducable panic when starting vinum from 
> single user mode. This turned out to be one (or more) of the following.
> 
> o	1 bad RAM stick
> o	1 marginal (on spec) RAM stick
> o	Aggressive BIOS settings
> o	Air filters clogged
> o	Unseasonably warm weather (+10F)
> o	Phase of the moon

Of all of those, it could either be a bios setting or ram, but that's
if that's a problem.  The machine has been running for a year and a
half and the panics have only been recently (last 6-9mo) or so.

> Find a display card and run memtest86 for an hour or so. Take a note
> of the memory throughput (for BIOS tuning).

Eh, not so wild about the prospect of the machine dumping given that
I'm 700+mi away from this particular box, but next time I'm in the
data center I will... but like I said, I don't think it's hardware.

-sc

-- 
Sean Chittenden


More information about the freebsd-stable mailing list