Panic 8.2 PRERELEASE WRITE_DMA48

Mon Jan 10 06:51:17 UTC 2011

On Mon, Jan 10, 2011 at 07:13:57AM +0100, Tom Vijlbrief wrote:
> 2011/1/9 Jeremy Chadwick <freebsd at jdc.parodius.com>:
> 
> >
> > Not to get off topic, but what is causing this?  It looks like you have
> > a cron job or something very aggressive doing a "smartctl -t short
> > /dev/ad4" or equivalent.  If you have such, please disable this
> > immediately.  You shouldn't be doing SMART tests with such regularity;
> > it accomplishes absolutely nothing, especially the "short" tests.  Let
> > the drive operate normally, otherwise run smartd and watch logs instead.
> >
> 
> I have this default entry (from the author of that file) in
> smartd.conf and enabled it on many machines over the years.
> Is it a bad practice?
> 
> # First (primary) ATA/IDE hard disk.  Monitor all attributes, enable
> # automatic online data collection, automatic Attribute autosave, and
> # start a short self-test every day between 2-3am, and a long self test
> # Saturdays between 3-4am.
> /dev/hda -a -o on -S on -s (S/../.././02|L/../../6/03)

I'll have to talk to Bruce Allen about that.  Those entries in
smartd.conf are pretty old (meaning they've existed for a very long
time, and chances are Bruce hasn't gone back to revamp them or
reconsider the logic/justification behind them).

I'm an opponent of running SMART tests automatically, given what some do
to drives.  It's important to remember that most SMART tests can be done
while the drive is in operation, and some of theses tests stress the
drive, which could potentially cause timeouts or other I/O anomalies
(data loss is unlikely, but odd errors may occur; it all depends on the
firmware).  This is especially important WRT "long" tests.  For example,
on newer 2TB Western Digital Caviar Black drives, a long test does
something that I haven't heard (yes, heard) any other drive do -- it
emits a noise that's almost identical to that of a head crash.  It could
be scanning a very specific region of LBAs (possibly out-of-range
sectors, e.g. spares) repetitively, but it sounds nothing like a
selective LBA scan.  Honestly it does sound like a head crash.  Is this
something you'd really want to be running every 7 days?

I've always advocated that people run smartd only if they want to
monitor attributes -- which ultimately are the most important things to
keep an eye on anyway.  It's even more important to know how to read
them.  :-)  90% of drives out there update their attributes at set
intervals or when the SMART READ DATA command is encountered.

And honestly I've never seen a SMART short test do anything useful, on
any drive I've used (SATA or SCSI; WD, Seagate, Maxtor, Hitachi,
Fujitsu).  Long test are different in this regard.

I'm fully aware that the terms "short" and "long" are vague in nature
and don't really tell a person what the drive is doing behind the
scenes.  Sadly that's the nature of SMART; they're just tests that are
defined on a per-vendor (or per-disk-model!) basis.  But as my 2nd
paragraph above implies, the behaviour is not consistent.

So when people ask me "how do I monitor my disks reliably with SMART
then?", I tell them to either do it by hand (which is what I do), or run
smartd(8)  and keep an eye on their logs.  This requires some tuning,
and familiarity with what attribute means what, and again on a per-drive
or per-vendor basis.

It's great that there's no actual standard for these, isn't it?  :-)

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |