Feedback for performance tracker

Wed Aug 15 11:51:10 PDT 2007

On Wed, Aug 15, 2007 at 11:04:29AM +0200, Erik Cederstrand wrote:
> 1) Which benchmarks would you like to see being run?
> 2) Which tests do you perform regularly, which the tracker could automate?
> 3) Which features in the web interface would you find most helpful?

Here's what Robert Watson last posted on this subject (on freebsd-arch@).
I hope that he doesn't mind the re-post.

 Date: Wed, 4 Jul 2007 12:58:44 +0100 (BST)
 From: Robert Watson <rwatson at FreeBSD.org>
 In-Reply-To: <20070704105525.GU45894 at elvis.mu.org>
 Message-ID: <20070704124833.W37059 at fledge.watson.org>
 References: <20070702230728.E552 at 10.0.0.1> <20070703181242.T552 at 10.0.0.1>
 	<20070704105525.GU45894 at elvis.mu.org>
 Cc: arch at freebsd.org

 I also worry about the narrowness of the benchmarking we're doing -- however, 
 it's hardly new.  We do best at optimizing where we have clearly defined 
 targets and measures of performance.  The four-times increase in MySQL select 
 performance is a direct result of Kris taking on scalability measurement and 
 helping developers with optimization ideas try them out, profile them, etc.

 A point I've made at a number of devsummits and elsewhere is that what we 
 really need now is more people to "take ownership" of the performance of 
 workloads they care about.  They don't need to be the people to do the 
 optimizations, but if they could help manage outstanding patchsets, measure 
 the change in performance over time, get involved in profiling, etc, then
 that will have a big effect on performance for the workload, as has
 happened with MySQL.

 Here are some workloads I'd really like to see people take responsibility for:

 - Flat file Apache performance, perhaps with Apachebench or another HTTP
    throughput measurement tool.

 - Dynamic Apache performance, perhaps using some combination of
    Apache/php/MySQL.

 - BIND query performance with a few realistic-looking workloads.

 - PostgreSQL performance along the same lines as current MySQL performance.
    Kris has waved his hands a bit in this direction already and much of
    the MySQL measurement work can be reused.

 - Some sort of compiler/build/etc test -- buildworld of HEAD tends to be
    highly variable over time as components change, compilers change, etc,
    but optimizing build performance still has a big benefit for developers.
    Perhaps how long it takes to do the post-buildtools bit of buildworld
    for a fixed FreeBSD version.

 - Network micro-benchmarks, including loopback TCP and UDP, multi-machine
    TCP and UDP, both single stream and multi-stream.

 - UI interactivity testing -- how long it takes to go from a simultaned
    keypress from the keyboard device to an input program running in an
    xterm and other related latency tests that will be affected by scheduling,
    IPC, and so on.

 There seem to be two parts of owning a benchmark:

 - Establishing baselines over time -- how doe FreeBSD 4.8, 5.5, 6.0, 6.1,
    6.2, 6-STABLE weekly, 7-CURRENT weekly, and maybe a Linux or NetBSD
    version perform for the workload using otherwise identical configuration.

 - Measurement and feedback -- identifying bottlenecks, working with
    developers to measure the results of specific optimizations, etc,
    across the life cycle of the patch.

 If Kris can motivate such a dramatic improvement in MySQL performance, it 
 seems likely that people doing similar things with other workloads could
 have similar effects.  And, as you say, breadth is really important --
 tuning the system for MySQL is very important, but has it generally hurt
 or helped other workloads?  In most cases, I'd expect work to date to
 have helped, because it involved lowering overhead, etc.  However, when
 we get into schedulers, space/time trade-offs, and so on, then that
 balance will become harder to strike.

 Robert N M Watson
 Computer Laboratory
 University of Cambridge