identifying and fixing server I/O slowdowns

Fri Aug 6 22:30:51 PDT 2004

Jeff Kramer wrote:
> Oh great and wise FreeBSD gurus,
> 
> I've been running FreeBSD boxes for about five years with great results 
> (up to 6 at the moment), but recently one of my machines has started to 
> seriously act up.  Every time a heavy disk operation (say, tar'ing a 1 
> gig directory) occurs the system slows to a crawl, and requests to 
> apache/php/mysql sites hosted on it just hang.
> 
> The system is a dual p3 1.13ghz box with a gig of ram and mirrored 80 
> gig WD800BB drives on a Promise TX2 controller.  The raid isn't 
> degraded.  There's a dedicated 1.5 gig swap partition and a swap file on 
> the /usr partition.  We had some apache processes go nuts one time, 
> which is why I added the swap file.
> [...]

This problem could be due to a disk drive that is about to fail.  If 
there are (still recoverable) disk errors, retrying the affected I/O 
operations can keep a disk controller occupied for serveral seconds.  Of 
course, all processes trying to do disk I/O during this time span will 
block.

Since the errors are (eventually) recoverable the raid array is likely 
to _not_ drop into degraded mode by itself.  After you've found out 
which of the disks it is you would have to force that disk into failed 
mode and would then replace it.  The exact details depend on your raid 
controller.

Of course, your mileage may vary, but I've experienced disk failures 
like these several times in the past, with the effect you've described.

    Uwe
-- 
Uwe Doering         |  EscapeBox - Managed On-Demand UNIX Servers
gemini at geminix.org  |  http://www.escapebox.net