Hints for precision benchmarking...

Mon Jan 26 08:53:27 PST 2004

A number of people have started to benchmark things seriously now,
and run into the usual problem of noisy data preventing any
conclusions.  Rather than repeat myself many times, I decided to
send this email.

I experiemented with micro-benchmarking some years back, here are
some bullet points with a lot of the stuff I found out.  You will
not be able to use them all every single time, but the more you
use, the better your ability to test small differences will be.

*   Disable APM and any other kind of clock fiddling (ACPI ?)

*   Run in single user mode.  Cron(8) and and other daemons
    only add noise.

*   If syslog events are generated, run syslogd with an empty
    syslogd.conf, otherwise, do not run it.

*   Minimize disk-I/O, avoid it entirely if you can.

*   Don't mount filesystems you do not need.

*   Mount / and /usr and any other filesystem possible as read-only.
    This removes atime updates to disk (etc.) from your I/O picture.

*   Newfs your R/W test filesystem and populate it from a tar or
    dump file before every run.  Unmount and mount it before starting
    the test.  This results in a consistent filesystem layout.  For
    a worldstone test this would apply to /usr/obj (just newfs and
    mount).  If you want 100% reproducibility, populate your filesystem
    from a dd(1) file (ie: dd if=myimage of=/dev/ad0s1h bs=1m)

*   Use malloc backed or preloaded MD(4) partitions.

*   Reboot between individual iterations of your test, this gives
    a more consistent state.

*   Remove all non-essential device drivers from the kernel.  For
    instance If you don't need USB for the test, don't put USB in
    the kernel.  Drivers which attach often have timeouts ticking
    away.

*   Unconfigure hardware you don't use.  Detach disk with atacontrol
    and camcontrol if you do not use them for the test.

*   Do not configure the network unless you are testing it (or after
    your test to ship the results off to another computer.)

*   Do not run NTPD.

*   Put each filesystem on its own disk.  This minimizes jitter from
    head-seek optimizations.

*   Minimize output to serial or VGA consoles.  Running output into
    files gives less jitter.  (Serial consoles easily become a
    bottleneck).  Do not touch keyboard while test is running,
    even <space><back-space> shows up in your numbers.

*   Make sure your test is long enough, but not too long.  If you
    test is too short, timestamping is a problem.  If it is too
    long temperature changes and drift will affect the frequency of
    the quartz crystals in your computer.  Rule of thumb: more than
    a minute, less than an hour.

*   Try to keep the temperature as stable as possible around the
    machine.  This affects both quartz crystals and disk drive
    algorithms.   If you really want to get nasty, consider stabilized
    clock injection. (get a OCXO + PLL, inject output into clock
    circuits instead of motherboard xtal.  Send me an email).

*   Run at least 3 but better is >20 for both "before" and "after"
    code.  Try to interleave if possible (ie: do no run 20xbefore
    then 20xafter), this makes it possible to spot environmental
    effects.  Do not interleave 1:1, but 3:3, this makes it possible
    to spot interaction effects.

    My preferred pattern:  bababa{bbbaaa}*  This gives hint after
    the first 1+1 runs (so you can stop it if it goes entirely the
    wrong way), a stddev after the first 3+3 (gives a good indication
    if it is going to be worth a long run) and trending and interaction
    numbers later on.

*   Use usr/src/tools/tools/ministat to see if your numbers are
    significant.  Consider buying "Cartoon guide to statistics"
    ISBN: 0062731025, highly recommended, if you've forgotton or
    never learned about stddev and Student's T.

Enjoy, and please share any other tricks you might develop!

Poul-Henning

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.