[call for helpers!] Tuning for the Beaver Challenge

Dung Patrick dkt at digitalme.com
Mon Feb 9 08:06:30 PST 2004


Hi

Beaver Challenge 2004 is coming!. 
Details in http://osuosl.org/benchmarks/bc/

We are preparing the tuning guide. Definitely we need suggestions and comments.

Please see this forum to view the latest tuning guide:
http://osuosl.org/forums/viewforum.php?f=8

Attached is a ver0.4 of the tuning guide.

Regards
Patrick

-------------- next part --------------
==================================================
FreeBSD tuning guide for the beaver challenge 2004
For "stock" class
Draft ver 0.4
==================================================

=========
Changelog
=========
0.1 Initial release
0.2 Add some notes about the packages for the benchmarking
0.3 sysctl.conf, loader.conf, more tuning on Apache and other minor updates
0.4 Partition order, 2.2.1(cron), 2.2.3 (sysctl), 2.2.5(flags), 2.2.6, 2.2.7

=========================
1. Background information
=========================
The machine:
Dell PowerEdge 2650 
2 - 2.8Ghz Intel Xeon processors 
2GB of RAM 
5 - 36GB U320 disks @ 10,000 RPM configured as RAID0 stripe 
Adaptec AAC raid controller 
It's full spec can be downloaded in here:
http://www.dell.com/downloads/global/products/pedge/en/2650_specs.pdf

From http://osuosl.org/benchmarks/bc/methodology/ .They will run these benchmarks:
Bonnie++ - file system performance.
lmbench - bandwidth and latency performance.
VolanoMark - a Java benchmark.
mySQL server throughput 
Apache server throughput 

Their testing script is also avaliable on their web page.
For Bonnie, there testing parameters is this:
my $file_size = "4g";
my $file_num= "128";
my $small_min_size = "16";
my $small_max_size = "20000";
my $large_min_size = "20000";
my $large_max_size = "100000";
my $num_dir = "512";
my $test_runs = 1;
my $symlink_runs = 1;
my $hardlink_runs = 1;

=================
2. Tuning section
=================
In this section, we are going to tune the system to provide the best benchmark.

----------------
2.1 Installation
----------------

Partitioning
------------
There will be about 180GB harddisk space and 2GB RAM. How are we going to partition the FreeBSD system? Suggested layout be like this (in order)

/home	>4GB (their bonnie++ script benchmark on this partition, if we do not create /home, then /home will be a symobilc link to /usr/home)
/	xxGB
swap	4GB (It have 2GB RAM)
/usr	xxGB (need some space to hold the ports, and possibly to build the jdk)

The /home is the first because we want to use the outside part of the hard disk. (So /home will be a separate partition instead of a slide?)
Use UFS1 for FBSD4 (no choice), and UFS2 for FBSD5.
Turn on soft updates for all partitions.

Some thought:
1) /home can be reformatted (to use different block/fragment size).
By looking at their bonnie script, it seems increasing the blocksize (default is 16KB) would benefit. But I do not have actual testing or supporting materials.
Unless we have any good reasons, we would stick to the default.
2) In the long run, will those /usr/src, /usr/ports, /usr/obj make the /usr partition becomes more and more fragments? Should we make separate partitions for them?

--------------------------
2.2 Tweaking (First stage)
--------------------------

After the installation, we are going to tune in different areas.

2.2.1 Turn off all unneeded services
------------------------------------
Remember for "stock" class:
The tweaks must reflect a system that could be placed into production. 

Turn off run these services:
inetd
portmap
sendmail
sshd (we use serial console access?)
cron
usbd? (I doubt it has any USB devices in the BC)

Question:
Turn these off for the sake of the bechmarking? (which should not be done in a production server)
syslog

2.2.2 /etc/fstab
----------------
By default, the file system will be mounted with atime. 
Let's change it. For more information, refer to man 8 mount.

To do it without reboot
# mount -o update,noatime /
Then, add 'noatime' to the appropriate place in /etc/fstab.
So that it will be done automatically in next reboot.

2.2.3 /etc/sysctl.conf
----------------------
This is a very critical file that many tweaking will be done in here.

To be completed, refer to
Tuning man page
TCP man page
More parameters to tune in http://sinuspl.net/txt/ (look for sysctl.txt) 

kern.ipc.somaxconn=8192 
kern.ipc.maxsockbuf=2097152 
net.inet.ip.portrange.last=30000 
net.inet.tcp.rfc1323=1 
net.inet.tcp.delayed_ack=0 
net.inet.tcp.sendspace=32768
net.inet.tcp.recvspace=32768
net.inet.udp.recvspace=32768
net.inet.icmp.icmplim=0 
# We will stick to the defaults unless we have any good reasons
#net.inet.udp.maxdgram=
#net.local.stream.sendspace=
#net.local.stream.recvspace=
#net.local.dgram.maxdgram=
#net.local.dgram.recvspace=
# Disable features that we are not going to use in the BC 
net.inet.icmp.drop_redirect=1 
net.inet.icmp.log_redirect=0 
net.inet.ip.redirect=0 
net.inet6.ip6.redirect=0 
net.inet.ip.sourceroute=0 
net.inet.ip.accept_sourceroute=0 
# Turn off ARP wrong interface log messages 
net.link.ether.inet.log_arp_wrong_iface=0 

Question:
1) Would it be too much buffer (sendspace+recvspace=64KB) for each connection?
Alexander suggested two ideas for networking:
2) Enable net.inet.tcp.inflight_enable (TCP bandwidth delay product limiting) and increase net.inet.tcp.slowstart_flightsize=X (where X >3) and;
3) Experiment with net.inet.tcp.newreno=0 (NewReno Fast Recovery algorithm, RFC 2582)
Any ideas?

2.2.4 /boot/loader.conf
-----------------------
This is another place that we would tune.
Some sysctl parameters are read only and can only be tuned at boot time.

kern.maxfiles=65535 
kern.maxfilesperproc=32768 
kern.ipc.maxsockets=32768	# Maximum number of sockets
kern.ipc.nmbclusters="16384"	# Set the number of mbuf clusters
#kern.ipc.nmbufs=""		# Set the maximum number of mbufs
# maximum no of sf_bufs (sendfile(2) zero-copy virtual buffers)
# If process are in the "sfbufa" state, then increase nsfbufs
#kern.ipc.nsfbufs=""		# Set the number of sendfile(2) bufs

2.2.5 /etc/make.conf
--------------------
This is also a critical file because we are going to optimize our kernel and packages.
Also refer to [4] and /etc/defaults/make.conf

These are the proposed parameters:
CPUTYPE?=p4 
CFLAGS= -Os -fno-strict-aliasing -fomit-frame-pointer -mcpu=pentium4 -march=pentium4 -pipe
# COPTFLAGS is for compiling just the kernel with special optimizations, should be used instead of CFLAGS
COPTFLAGS= -Os -fno-strict-aliasing -fomit-frame-pointer -mcpu=pentium4 -march=pentium4 -pipe

Other parameters that might be useful:
NOPROFILE=true	# Avoid compiling profiled libraries (do we need this?)
CXXFLAGS+= (do we still need this for -stable?)

Questions:

Should we use -O/-O2/-Os?
Maybe we use -Os in FBSD4-stable and -O in FBSD5/-current.

In FreeBSD 5.2, /bin and /sbin are now dynamic, rather than static linking.
Will this hurt the performance of Bonnie++/lmbench/VolanoMark/mySQL/Apache ?
Alexander point out that with dynamic linking, it will hurt the performance of shell scripts. Every use of executables in /bin and /sbin will be a little bit slower.

2.2.6 Update our source
-----------------------
Lets assume it is allowed to update the FreeBSD source and the ports tree.

The next question is whether we will use (and don't know if OSUOSL allowed us) to use FreeBSD-stable instead of FreeBSD 4.9 release and FreeBSD-current instead of FreeBSD 5.2 release. 

We assume there will be small difference in terms of performance in the stable branch.
We are going to use -current if possible.

2.2.7 Make the world and the kernel
-----------------------------------
Now we have the optimized make.conf, we are going to rebuild the world and make our new kernel.
We have to remove unnecessary options and add new tweaks for the kernel config.

To be removed both FBSD4/5:
NFS client/server/root
Other filesystems except FFS/UFS
All other scsi device except aac
IPv6 (If they are not going to test it)
All other network device except the network card (Broadcom?)
Leave only I686_CPU
isa? eisa?
usb?
...

To be added in both FBSD4/5: (See LINT)
CPU_FASTER_5X86_FPU	# enables faster FPU exception handler.
CPU_SUSP_HLT		# enables suspend on HALT. If this option is set, CPU enters suspend mode following execution of HALT instruction. (Do we really want this?)
NO_F00F_HACK		# disables the hack that prevents Pentiums (and ONLY
# Pentiums) from locking up when a LOCK CMPXCHG8B instruction is
# executed.  This option is only needed if I586_CPU is also defined,
# and should be included for any non-Pentium CPU that defines it.
CPU_ENABLE_SSE 		# enables SSE/MMX2 instructions support. (It should not be needed normally?)

CPU_UPGRADE_HW_CACHE	# Question: I don't know if it is safe to use this one.
From /usr/src/sys/i386/initcpu.c:
         * OS should flush L1 cache by itself because no PC-98 supports
         * non-Intel CPUs.  Use wbinvd instruction before DMA transfer
         * when need_pre_dma_flush = 1, use invd instruction after DMA
         * transfer when need_post_dma_flush = 1.  If your CPU upgrade
         * product supports hardware cache control, you can add the
         * CPU_UPGRADE_HW_CACHE option in your kernel configuration file.
         * This option eliminates unneeded cache flush instruction(s).

For the kernel config in FreeBSD4:
Use APM or ACPI?

For the kernel config in FreeBSD5:
SCHED_ULE will be used instead of SCHELD_4BSD
ZERO_COPY_SOCKETS

Questions:
Should HTT be used together? or just plain SMP? (Be sure to read /usr/src/sys/i386/conf/NOTES if -current is used.)
Use device polling? (It seems it only support these devices: dc,em,fxp,nge,rl and sis, man polling)

2.2.8 Reboot
------------
We are going to use our new kernel.

---------------------------
2.3 Tweaking (Second Stage) (To be completed)
---------------------------
OSUOSL plan to run these for benchmarking:
Bonnie++ - file system performance.
lmbench - bandwidth and latency performance.
VolanoMark - a Java benchmark.
mySQL server throughput 
Apache server throughput

Question:
Are we going to use KSE for all these packages?

2.3.1 Bonnie++
--------------
For FreeBSD current, a recent build would make Bonnie++ link to the libpthread.

While I test Bonnie++ in my home PC, I got this in syslog. But don't know what's wrong.
calcru: negative time of 468737024 usec for pid 45411 (bonnie++)

2.3.2 lmbench
-------------
It seems there is not much things that we can do.

2.3.3 VolanoMark
----------------
We need to compile JDK 1.4.2. We cannot download the package from anywhere.

To run volano, I need to change these (my kernel has ipv6 complied in):
net.inet6.ip6.v6only=0
or run java with -Djava.net.preferIPv4Stack=true
A patch is in http://force-elite.com/~chip/patches/VolanoMark/freebsd-startup-sun14.patch

Assume the volano java test run on loopback interface.
I have installed the volano and do a loopback test locally.
It just stop in 140 connections and I see that a java process has a state of maxthr (I guess it's the maximum thread)
So I try to increase the maximum thread per process to 400. (For loopback test, it should only do 200 connections)
kern.threads.max_threads_per_proc=400

If the volano java test is going to test network capability, then kern.maxfiles must be increased to least 10240.
(The startup script will do "ulimit -Sn 10240")
kern.maxfiles=16384
Also kern.threads.max_threads_per_proc must also be increased, else the maximum connections will be bounded.

Question:
What are these settings? They are not documented..
kern.threads.max_groups_per_proc:
kern.threads.max_threads_hits:

2.3.4 mySQL
-----------
/etc/my.cnf
A good place to look at my-huge.cnf or my-large.cnf (in /usr/local/share/mysql)

/etc/my.cnf
max_connections = 1024 
key_buffer = 64M 
read_buffer_size = 2M 
sort_buffer_size = 8M 
table_cache = 128 
query_cache_limit = 1M 
query_cache_size = 32M 
query_cache_type = 1 
skip-innodb 
log-bin 
skip-name-resolve 

Question:
1) Use linuxthreads for -stable?
Please have a look in here: http://jeremy.zawodny.com/blog/archives/000697.html
2) Where does the mysql (ports) looks for my.cnf (/etc/my.cnf or /usr/local/etc/my.cnf?)

2.3.5 Apache
------------
Until now, we do not know if Apache 1.x or 2.x or both will be benchmarked.

We should use '--enable-nonportable-atomics=yes' and staticly build in all needed modules.

If Apache 2.x is used, the Worker MPM module should be used. Refer to [6]

chipig is working on the kqueue patch for Apache.
He also has the httpready patch:
http://force-elite.com/~chip/patches/httpd/accept_filter/use-httpready-filter.patch
It has been added to Apache 2.1(CVS HEAD, but backported to 2.0.XX). 

We have to make sure HostnameLookups is Off. (default is off).

Specific to FreeBSD: Accept Filter and Accpet Data
For Apache 1.3.x
We will use accept filter module (accf_http)
http://httpd.apache.org/docs/misc/perf-bsd44.html
For Apache 2.0.x
We will use accept data module (accf_data)

Question:
May be for unlimited class only, should we turn off logging in Apache?
If yes, then we would comment all log related directives in httpd.conf. (use /dev/null for ErrorLog)

==================
3. Other questions
==================
We know that the machine should use Gigabit ethernet. But we do not know whether the clients will use Gigabit/Fast ethernet.

"Benchmarking BSD and Linux" has illustrated some weakness in FreeBSD.
Please refer to [5] and see how we can improve.

=============
4. References
=============
[1] Tuning man page
http://www.freebsd.org/cgi/man.cgi?query=tuning&apropos=0&sektion=0&manpath=FreeBSD+5.2-current&format=html

[2] Tcp man page
http://www.freebsd.org/cgi/man.cgi?query=tcp&apropos=0&sektion=0&manpath=FreeBSD+5.2-current&format=html

[3] Tuning FreeBSD for different applications
http://silverwraith.com/papers/freebsd-tuning.php

[4] Optimising FreeBSD and it's kernel
http://silverwraith.com/papers/freebsd-kernel.php

[5] 5.2.1 Tuning Ideas
http://osuosl.org/forums/viewtopic.php?t=56

[6] Apache MPM Worker module
http://httpd.apache.org/docs-2.0/mod/worker.html


More information about the freebsd-hackers mailing list