smartd long self-test causes drives to hang
Jo Rhett
jrhett at netconsonance.com
Mon Nov 24 13:50:47 PST 2008
On re-reading the message I realized that my message was in danger of
being content-free.
gmirror whole-disk mirror of seagate 300gb drives
$ atacontrol list
ATA channel 0:
Master: ad0 <ST3300622A/3.AAH> ATA/ATAPI revision 7
Slave: ad1 <ST3300622A/3.AAH> ATA/ATAPI revision 7
$ gmirror list
Geom name: gm0
State: COMPLETE
Components: 2
Balance: round-robin
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 575427344
Providers:
1. Name: mirror/gm0
Mediasize: 300069051904 (279G)
Sectorsize: 512
Mode: r5w5e6
Consumers:
1. Name: ad0
Mediasize: 300069052416 (279G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 0
Flags: DIRTY
GenID: 0
SyncID: 1
ID: 3917165570
2. Name: ad1
Mediasize: 300069052416 (279G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 0
Flags: DIRTY
GenID: 0
SyncID: 1
ID: 3874187635
On Nov 24, 2008, at 12:48 PM, Jo Rhett wrote:
> I've spent about 3 months tracing down what was causing my personal
> colo box to start getting "sluggish" right around dawn every
> Saturday morning. It took so long because some mornings I simply
> couldn't pull my head out of my tail enough to do proper debugging.
>
> The cause was *really slow* filesystem response time. No cron jobs
> in that period. No specific process ran any slower than another,
> although I eventually learned that ones which did no file i/o were
> fine. And finally I realized that just "ls -la" was very slow (~1
> minute) even after I had killed off every disk-using process in the
> system. SMTP and HTTP in particular were basically fubar.
>
> No data loss, just *real slow*. Nothing other than a soft reboot
> ever solved the problem. Even leaving it running only minimal
> processes for 24 hours didn't bring it back to normal.
>
> Finally I was browsing through Jeremy Chadwick's list of known ATA
> problems and spotted his comments about smartd self-tests causing
> problems. Sure enough, my long self test was scheduled for 5am on
> Saturday mornings. Rechecking the observed slow-down periods
> confirmed that the problem never became visible before 5am.
> (sometimes it took up to 45 minutes before things slowed down enough
> to set off monitoring alarms)
>
> So, long story short, if you're having weirdness in system time
> response - check the smartd configuration, and try disabling the
> self tests. The short self test I was running daily didn't appear
> to affect anything, but the long test was just bringing the system
> to just shuddering and limping at best.
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org
> "
More information about the freebsd-stable
mailing list