problems with AHCI on FreeBSD 8.2

Tue Feb 14 22:39:29 UTC 2012

Thank you again Jeremy, sure it helps!

On Tue, Feb 14, 2012 at 9:31 PM, Jeremy Chadwick
<freebsd at jdc.parodius.com> wrote:
> On Tue, Feb 14, 2012 at 09:19:02PM +0100, Oscar Prieto wrote:
>> Thank you Jeremy, i'm already checking your links.
>>
>> When i installed smartd i configured a daily short test and a weekly
>> long one for all the drives while the machine remains mostly unused,
>> never thought it could be a problem reading the documentation and info
>> around.
>>
>> # /usr/local/etc/smartd.conf
>> /dev/ada0 -a -o on -S on -s (S/../.././03|L/../../2/07)
>> /dev/ada1 -a -o on -S on -s (S/../.././04|L/../../3/07)
>> /dev/ada2 -a -o on -S on -s (S/../.././05|L/../../4/07)
>> /dev/ada3 -a -o on -S on -s (S/../.././06|L/../../5/07)
>
> The problem is that, quite honestly, these do you zero good.  All it does
> is make a mess (per se) of the SMART self-test log.
>
> Take for example your situation with ada3: smartd(8) told you that the
> number of pending sectors increased to 5, and uncorrected increased to
> 1.  That's really all you need to know at that point.  If you want to
> know the LBA numbers which are problematic, you can manually intervene.
>
> The point is: the drive itself is going to notice problematic or bad
> sectors quicker than periodic short or long or surface scan tests will.
> Let the drive do its thing normally and only use SMART tests when
> there's indication something is wrong.
>
>> I'll remove the checks, do you advice for removing the daemon altogether?
>
> smartd(8) is useful because it keeps track of attributes which change in
> value and logs data to syslog (if I remember right), thus you have an
> exact time/date when an attribute changed.  This is especially useful
> for things pertaining to sector/physical media problems.
>
> As such, I tend to recommend folks using smartd(8) properly tune their
> smartd.conf to only monitor specific attributes.  This varies from drive
> to drive, but the key ones are things like attributes 5, 10, 11, 192,
> 193, 194 (if you want temperature logging), 196, 197, 198, 199, and 200.
> I'm speaking strictly for Western Digital disks here.
>
> The stock defaults, if I remember right, are to "monitor everything",
> which really doesn't work well given that so many vendors encode their
> RAW_VALUE fields in proprietary/vendor-specific formats.  People will
> often monitor things like the Hardware_ECC_Recovered attribute and start
> "freaking out" once day when the value goes from 0 to 838938239 or
> something larger.  Attribute data formats are not part of the ATA
> standard, so vendors choose to encode them.  Plus, not many admins that
> I've run into (honest) know what that attribute actually means
> disk-wise (hint: it's 100% normal for sector ECC to happen at all times;
> magnetic media is not perfect, that's what the per-sector ECC section is
> for!)
>
> However: people don't understand what SMART attribute acquisition
> actually does behind the scenes -- it results in the disk having to read
> from the HPA area (not user accessible or within LBA regions), which
> means seeking + moving the arms to an area, reading, then reporting all
> of this back.  Thus, it impacts I/O performance.  This is why I don't
> use smartd(8) on any of our systems.  But if I was to use it?  I would
> have it poll maybe every 120 minutes, rather than every 30.  It all
> depends on the system/load/etc..  I've seen people poll every 5 minutes
> (I think they're absolutely crazy/paranoid).  Their systems, their
> problem.  :-)
>
> Hope this helps.
>
> --
> | Jeremy Chadwick                                 jdc at parodius.com |
> | Parodius Networking                     http://www.parodius.com/ |
> | UNIX Systems Administrator                 Mountain View, CA, US |
> | Making life hard for others since 1977.             PGP 4BD6C0CB |
>