smartd long self-test causes drives to hang

Jo Rhett jrhett at netconsonance.com
Mon Nov 24 13:50:47 PST 2008


On re-reading the message I realized that my message was in danger of  
being content-free.

gmirror whole-disk mirror of seagate 300gb drives

$ atacontrol list
ATA channel 0:
     Master:  ad0 <ST3300622A/3.AAH> ATA/ATAPI revision 7
     Slave:   ad1 <ST3300622A/3.AAH> ATA/ATAPI revision 7

$ gmirror list
Geom name: gm0
State: COMPLETE
Components: 2
Balance: round-robin
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 575427344
Providers:
1. Name: mirror/gm0
    Mediasize: 300069051904 (279G)
    Sectorsize: 512
    Mode: r5w5e6
Consumers:
1. Name: ad0
    Mediasize: 300069052416 (279G)
    Sectorsize: 512
    Mode: r1w1e1
    State: ACTIVE
    Priority: 0
    Flags: DIRTY
    GenID: 0
    SyncID: 1
    ID: 3917165570
2. Name: ad1
    Mediasize: 300069052416 (279G)
    Sectorsize: 512
    Mode: r1w1e1
    State: ACTIVE
    Priority: 0
    Flags: DIRTY
    GenID: 0
    SyncID: 1
    ID: 3874187635


On Nov 24, 2008, at 12:48 PM, Jo Rhett wrote:
> I've spent about 3 months tracing down what was causing my personal  
> colo box to start getting "sluggish" right around dawn every  
> Saturday morning.  It took so long because some mornings I simply  
> couldn't pull my head out of my tail enough to do proper debugging.
>
> The cause was *really slow* filesystem response time.  No cron jobs  
> in that period.  No specific process ran any slower than another,  
> although I eventually learned that ones which did no file i/o were  
> fine.  And finally I realized that just "ls -la" was very slow (~1  
> minute) even after I had killed off every disk-using process in the  
> system.  SMTP and HTTP in particular were basically fubar.
>
> No data loss, just *real slow*.  Nothing other than a soft reboot  
> ever solved the problem.    Even leaving it running only minimal  
> processes for 24 hours didn't bring it back to normal.
>
> Finally I was browsing through Jeremy Chadwick's list of known ATA  
> problems and spotted his comments about smartd self-tests causing  
> problems.  Sure enough, my long self test was scheduled for 5am on  
> Saturday mornings.  Rechecking the observed slow-down periods  
> confirmed that the problem never became visible before 5am.   
> (sometimes it took up to 45 minutes before things slowed down enough  
> to set off monitoring alarms)
>
> So, long story short, if you're having weirdness in system time  
> response - check the smartd configuration, and try disabling the  
> self tests.  The short self test I was running daily didn't appear  
> to affect anything, but the long test was just bringing the system  
> to just shuddering and limping at best.
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org 
> "



More information about the freebsd-stable mailing list