Short SMART check causes disk op timeouts

Jeremy Chadwick koitsu at FreeBSD.org
Mon Oct 27 15:03:26 PDT 2008


On Mon, Oct 27, 2008 at 05:30:26PM -0400, Alexandre Sunny Kovalenko wrote:
> On Mon, 2008-10-27 at 21:45 +0100, Miroslav Lachman wrote:
> > Jeremy Chadwick wrote:
> > 
> > > On Mon, Oct 27, 2008 at 08:50:44PM +0100, martinko wrote:
> > > 
> > >>Jeremy Chadwick wrote:
> > >>
> > >>>On Mon, Oct 27, 2008 at 07:52:01PM +0100, martinko wrote:
> > >>>
> > >>>>Jeremy Chadwick wrote:
> > >>>>
> > >>>>>>>>Now, does the timeout cause loss of any data? Is there anything besides
> > >>>>>>>>disabling the testing that I can do about it?
> > >>>>>>>
> > >>>>>>>Do you understand what short and long offline tests actually do and what
> > >>>>>>>they're used for?  :-)  If so, you'd know that running them periodically
> > >>>>>>>is more or less silly (IMHO).
> > >>>>>>
> > >>>>>>I do not, not completely :) I think I have just copied the settings from
> > >>>>>>somewhere and only just tweaked it a bit whenever I have added a disk.
> > >>>>>
> > >>>>>Let me know if you figure out who or what online resource solicited
> > >>>>>adding daily short/long tests, as I'd like to talk to them about their
> > >>>>>decision.  I have a feeling whoever thought it up felt that the tests
> > >>>>>were performing entire sector scans of the entire disk, which is simply
> > >>>>>not the case.
> > >>>>>
> > >>>>
> > >>>>Hallo,
> > >>>>
> > >>>>Reading this thread I checked my config to find this: ;-)
> > >>>>
> > >>>>#/dev/ad0 -a -n standby,q -o on -S on -s (S/../.././02|L/../../7/03) 
> > >>>>-m  root    # ++ 2006-11-03 mato
> > >>>>/dev/ad0 -a -o on -S on -s (S/../.././02|L/../../7/03) -m root  # ++  
> > >>>>2006-11-03 mato
> > >>>>
> > >>>>I believe I came up with the settings after reading manual page /   
> > >>>>documentation of the tool.
> > >>>
> > >>>Can you explain why you're doing this?  So far no one's provided a
> > >>>reason *why* they're doing short and long offline scans on a daily
> > >>>basis.  I'm under the impression the conclusion was reached like this:
> > >>>"man smartd.conf ... oh, -s, a neat thing, let's enable it".
> > >>>
> > >>>There are negative repercussions to doing tests of this nature at such
> > >>>regular intervals.  Once-a-week is borderline acceptable; once a month
> > >>>would be quite reasonable.  I'd love to know what kind of affect daily
> > >>>tests have on MTBF; I can imagine it's reached much sooner with this.
> > >>>
> > >>>The main point of smartd is to monitor SMART attribute changes.  If
> > >>>you're concerned about the health of your hard disk, you should be
> > >>>looking at your logs and not relying on things like automatic short/long
> > >>>tests.  Most SMART attributes are updated immediately and not during an
> > >>>offline test, and all of those attribute changes will be logged.
> > >>>
> > >>
> > >>You asked Miroslav about source of his configuration.  And as it is very  
> > >>similar to mine I think we both have it from smartd documentation. Where 
> > >>else to look for information?  It's a usual source.  So if you think it's 
> > >>wrong please contact the authors, we're obviously just users.
> > >>Thanks.
> > > 
> > > 
> > > I'm not asking *where* you got the information from (we know where you
> > > and others got it from: the documentation).  I'm asking you *why* you
> > > enabled what you did, because this is not something smartd.conf enables
> > > by default (the example is commented out).
> > > 
> > > If you *really* want me to talk to Bruce about this, I can/will, but I'm
> > > left with the impression that the example in smartd.conf is there to
> > > show people the syntactical usage of -o, and not to advocate its usage.
> > > 
> > > 
> > >>PS: Btw, long offline scan is scheduled on weekly basis, not daily. If  
> > >>it's good or not I do not know.
> > > 
> > > 
> > > The OP's long scan is also scheduled on a weekly basis (every Sunday),
> > > but his short scan trumps it.
> > > 
> > > Folks, the point I'm trying to make here is that daily -- and even
> > > weekly -- SMART offline tests are unnecessary.  If you're that concerned
> > > about your disk health, you should be looking at your syslog logs for
> > > attribute changes that indicate drive issues.  Performing SMART offline
> > > tests at regular intervals like this does very little other than
> > > increase wear/tear on drive components (not necessarily the physical
> > > platters/heads; there are many pieces to a hard disk.  :-) )
> > 
> > It is more than three years ago when I started to use smartd and I did 
> > not change my configs from that time, just copy it to all the new 
> > servers, so I can't tell why I had feeling that daily short and weekly 
> > long test is "the right way".
> > Do you have some link to brief overview, where we can read something 
> > about "the best practices" with smartd? Or may I just change the config 
> > to do short test once a week and long test once a month?
> > 
> > Miroslav Lachman
> > 
> > PS: all examples in smartd.conf.sample are commented out (DEVICESCAN is 
> > the default), but almost all of the examples have weekly long test, this 
> > may lead to our conclusion "weekly long test is good"
> They are *not* commented out in the example configuration found while
> reading the man page. All of the examples in the man page use daily
> short and weekly long offline tests. Since it would have been as
> illustrative to depict monthly short and annual long tests, most of the
> readers assumed that it is The Good Thing. If it indeed is not, someone
> should kindly ask man page author to use different frequencies or, at
> least vary them from example to example, so they are less suggestive.

Are people forgetting that this is a 3rd-party piece of software that's
in the ports tree?  Why is everyone demonising what's in the man page,
the example/sample configuration, etc. etc. when nobody controls this
but the author of the software?  I realise the port maintainer has some
degree of responsibility, but port maintainers -- as a default -- have
to make the assumption that the default/sample configuration files that
come with the software are sufficient.

I had no idea users were blindly uncommenting examples in
smartd.conf.sample without reading what the features do.  Then again, I
guess many users/admins have no idea what sort of impact offline tests
could have on a system.  Short/long tests should not have any effect on
a running/used disk -- and most do not see any effect -- but under high
I/O I would assume there is a chance the suspend/resume aspect of SMART
tests could take longer than 5 seconds.  Though I am disappointed in
the fact that people often schedule "maintenance things" all at the same
time (between 0200 and 0500) but never think about the implications of
them all running in parallel.

I'll get in contact with Bruce Allen (author of smartmontools) and
discuss all of this with him.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |



More information about the freebsd-stable mailing list