Disk 100% busy

Brad Knowles brad at stop.mail-abuse.org
Sun Oct 16 18:27:15 PDT 2005


At 9:16 AM -0400 2005-10-16, Will Saxon wrote:

>  In this case, my mail gateway is is a dual 3.06GHz Xeon with 1GB of ram
>  and 2 36GB 15krpm drives in a raid-1 on a smart array 6i (cciss)
>  controller. I am running FreeBSD 5.4-RELEASE-p1.
>
>  Systat -vmstat reports the disk mirror is 100% busy at all times on this
>  machine, with an average of around 300 tps at 15KB/t.

	Note that RAID-1 is the second worst-case for mail server 
performance -- it accelerates reads (if you have mirror 
load-balancing), but all writes are required to be held until 
complete on both disks.  The only worse case would be RAID-5, where 
you have to write (or re-write) an entire RAID block at once, plus 
the parity information.


	For mail servers, you really want to watch your synchronous 
meta-data updates.  FreeBSD is a good choice here, if you've got Soft 
Updates enabled (I think that FreeBSD 5.x does that by default).

	But, you also want to watch your directory sizes.  If the 
directory size gets too large, then it takes too long to lock the 
directory against any other updates, scan through the entire 
directory to make sure there aren't any collisions, create/delete the 
file, then unlock the directory -- a process which has to be done 
every time a file is created or deleted.  This is why most modern 
mail servers use a "hashed queue" scheme, so that you can greatly 
increase the chances of multiple processes working simultaneously 
without stepping all over each others toes.

	However, with regards to directory size issues, keep in mind that 
even if the directory does not currently have 100,000 files in it, if 
it ever had 100k files in it in the past, it's still got all those 
empty directory slots laying around and that still slows things down 
a lot.  If you suspect that this may have happened in the past, you 
need to stop the offending program, move the old directories aside, 
create new directories with the same ownership/permissions, then 
restart the program.

	And don't forget to make sure to clean out the old directories 
you had moved aside, either by creating some manual queue runners, or 
whatever.


	In your case, while the MTA may be configured in a way to avoid 
most of these issues, the anti-virus scanning solution may not.  So, 
you may need to find a way to go in and deal with this.


	If you want to find out how all these issues affect the MTA, you 
need to read the book "sendmail Performance Tuning" by Nick 
Christenson (see <http://www.jetcafe.org/npc/book/sendmail/>).  Once 
you read this book, you will hopefully have a better idea of how 
these same issues may affect your anti-virus scanning solution, and 
what you may need to do about it.

	I also recommend the slides from Nick's "Performance Tuning 
Sendmail Systems" paper at 
<http://www.jetcafe.org/npc/doc/performance_tuning.pdf>, as well as 
my own slides on the same general subject at 
<http://www.shub-internet.org/brad/papers/sendmail-tuning/>.

>                                                         This seems wrong
>  to me, as these numbers are maintained even when the system doesn't
>  otherwise appear busy. We don't seem to be swamped by log writes.

	How can you be sure?  How are you logging information today?  Is 
that being logged to a separate filesystem on a separate disk system?

>                                                                     How
>  can I tell what's generating these disk writes? At the moment the 100%
>  disk utilization is the only thing I can see that would cause the
>  scanning delay. The machine overall is sluggish with file operations.

	You have a certain amount of information available to you from 
tools like vmstat and iostat, as well as systat.  However, in order 
to understand how to use them to see where your problems really lie, 
you need information such as provided in Nick's book.  You should 
also read other books on overall system performance tuning.  The 
O'Reilly book on this subject (see 
<http://www.oreilly.com/catalog/spt2/>) is a good start, even though 
it is a few years old.

-- 
Brad Knowles, <brad at stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.


More information about the freebsd-stable mailing list