Continual benchmarking / regression testing?

Tue Jan 7 17:01:39 UTC 2014

On Tue, Jan 7, 2014 at 9:11 AM, Julio Merino <julio at meroh.net> wrote:
> On Tue, Jan 7, 2014 at 4:09 PM, Ivan Voras <ivoras at freebsd.org> wrote:
>> Hello,
>>
>> Is someone working on a contitual benchmarking / regression testing
>> project for FreeBSD? I seem to recall there was a post several months
>> ago but I can't find it.
>
> See http://wiki.freebsd.org/TestSuite for the current efforts.

I think that Kyua is less than ideal for benchmarking.  It could be
extended, but there are two fundamental differences between a test
framework and a benchmark framework:

1) Benchmarks are slow.  Not only that, but they usually come with a
bewildering array of options (file size, I/O size, etc) that
exponentially increase the time required to do a comprehensive run of
all available tests with all available options.  So, you don't want to
run all of the benchmarks all of the time.  In contrast, tests are
usually fast, and you usually want to run all of them all of the time.

2) Tests usually have a binary output.  Did it pass or didn't it?
Kyua has a few other possible outcomes (expected failure, skipped,
broken), but it's still a short list.  In contrast, benchmarks usually
have a variable output, expressed as one (or more) real numbers.

IMO, the extensions that would be required for Kyua to function as a
benchmark framework would be too intrusive; they would make it more
difficult to maintain Kyua's role as a test framework, and add nothing
to Kyua's testing abilities.  I think that a separate benchmarking
framework would be better.

The best benchmarking framework that I know of is the Phoronix Test
Suite (http://www.phoronix-test-suite.com/) .  Its cross-platform, it
has a decent report generator, including a public list of results at
http://openbenchmarking.org/, and a huge library of benchmark
programs.  However, it has several drawbacks.  Many of the benchmark
programs are of poor quality IMHO; they seems like that get committed
without sufficient analysis to make sure that they're testing
something useful.  Also, while the PTS does some hardware profiling
before each run (see representative output at
http://openbenchmarking.org/result/1401071-UT-BUKOWSKIW54 ), it is
insufficient to really do a scientific analysis of hardware's
contributions to the scores.  For example, there is no way to query
openbenchmarking.org to see a graph of all the results for test X on
systems with CPU Y and harddrive Z and RAM speed Q vs the amount of
installed memory, with multiple results plotted as range bars.  I
would really like to be able to do that.  In fact, the cross-platform
nature of the PTS makes it harder to collect such information.
Finally, the PTS doesn't have any ability to run tests on a cluster of
machines.  That is critical for testing any subsystem that involves
networking, for example NFS.

For these reasons, I set out to write my own framework.  At a very
high level, it provides a framework that handles common functionality
like reporting results, commanding slave nodes, profiling the system
hardware, etc.  The individual benchmark programs are each written as
ruby scripts that are executed by the framework.  Importantly, the
framework does not include any kind of built-in sequencer.  There is
no way to say "run all benchmarks".  I envision that a technician
would be responsible to selecting which benchmarks to run with which
configuration options based on an organizations current needs.  In a
CI setting, there would be a short sh script that would run several
benchmarks in series.  In any case, the result report format does not
assume anything about how the tests were sequenced.  Each result
enters the database as a separate record with full information about
its configuration and the hardware and software environment under
which it ran.

Unfortunately, my framework is extremely incomplete.  It's not even
good enough for internal use, much less a wider audience.  And I fear
that my bosses won't give me any more time to work on it.  It's also
written in Ruby and uses STAF to command slave nodes, which the
FreeBSD community might not be excited about.  However, if there is
any interest, I can ask for permission to share my design as a
starting point for a more general framework.

-Alan

>
> --
> Julio Merino / @jmmv
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"