Hard disk bottle neck.

Danny Do ai_quoc at hotmail.com
Sun Sep 28 14:50:12 UTC 2008


Hi Matthew & Wojciech Puchar and others,

First of all, I'd like to correct one mistyped:
- I got 6x300GB SCSI 10K RPM hard drive.
- Most of my files are about 100MB, many as big as 1GB.
- Caching is not an option.

Thanks for the advices but caching is not an option for me as most of my
files are about 100MB, many files are as big as 1GB. 

I tried Lighty a few years ago but it doesn't help. The problem I think is
disk seek. If I can reduce disk seek by increasing read buffer, I think
problem would be solved. 

I am thinking of trying Wojciech Puchar method by patching the kernel with
the following code:

patch /usr/src/sys/sys/param.h

#ifndef DFLTPHYS
#define DFLTPHYS        (1024 * 1024)   /* default max raw I/O transfer size
*/
#endif
#ifndef MAXPHYS
#define MAXPHYS         (1024 * 1024)   /* max raw I/O transfer size */
#endif
#ifndef MAXDUMPPGS

I'll update the result. I'll tell you how I go. Maybe sometimes in the next
fortnight.

Thanks everyone, thanks Wojciech Puchar,

Danny


-----Original Message-----
From: owner-freebsd-questions at freebsd.org
[mailto:owner-freebsd-questions at freebsd.org] On Behalf Of Matthew Seaman
Sent: Sunday, 28 September 2008 7:30 PM
To: Danny Do
Cc: freebsd-questions at freebsd.org
Subject: Re: Hard disk bottle neck.

Danny Do wrote:
> Hi guys,
> 
>  
> 
> I have this problem for years but couldn't find a way to solve it.
> 
> I have a file server handling large files from 1MByte to 1GByte. 
> 
> Server Info:
> FreeBSD 6.2
> Apache 2.2.9
> 
> DELL PowerEdge 1850
> 2GB RAM (only 184MB is active)
> 6x300MB SCSI 10K RPM RAID5
> Gigabit Ethernet Connection
> 
> My server can output NO MORE than 60Mbps (read only). 
> 
> The bottle neck is the hard disk. If I use ONE connection to download 
> file from my server, the speed can go up to about 400Mbps.
> 
> If I let visitors download using multiple connections, the server 
> cannot output more than 60Mbps.
> 
> My service is similar to rapidshare/megaupload, I am wondering how 
> they configure their servers?
> 
> If I recall correctly, it doesn't cost much time to read the data from 
> the disk but it does cost a lot of time to seek for the data. Correct 
> me if I am wrong, if I increase the read buffer size, there would be 
> less disk seek (disk access). Let's say the read buffer is 64K, if I 
> increase it to 640K, the disk seek would reduce by 90%. Thus, more 
> data can be read from the hard drive.
> 
> What should I do now?

Try some different webservers. Apache is great, but it is designed to be
maximally flexible and capable of doing anything you can imagine rather than
to be absolutely as fast as possible.

There are some light-weight servers which have put work into optimizing
delivery of static content -- usually spoken of in the context of serving
images but any static files will be suitable material.  Personally, I really
like nginx for this.  Lots of people go for lighttpd and there are a number
of other alternatives in ports.

Also, depending on exactly how much content you have to serve and whether
certain items are very much more popular than others, a reverse proxy /
memory cache (a.k.a http accelerator) may help.  varnish is the obvious
candidate here, but you'll have to experiment a bit to see what the optimal
settings are and if it actually helps at all.

If your website runs using a scripting language such as PHP, then another
possibility is memcached -- although described as a cache for dynamically
generated pages, it can cache just about anything, but you will need some
sort of scripting language to interface to it from your web server.  There
are memcached APIs for all popular languages and probably a few you've never
heard of...

The various caching strategies basically work because they keep recently
accessed files in RAM, avoiding an expensive round-trip to the HDD to
retrieve the data (memory access takes nano- or micro- seconds: disk
accesses take milliseconds).  Of course, the OS itself also does exactly the
same thing in a general way, and FreeBSD is already very good in this
respect.  Caching  software however gives you more control over what gets
cached and for how long,  enabling you to tune this specific application for
maximum performance.

	Cheers,

	Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.                   7 Priory Courtyard
                                                  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey     Ramsgate
                                                  Kent, CT11 9PW




More information about the freebsd-questions mailing list