realistic web benchmark

Tue Feb 3 02:45:30 PST 2004

"Igor Shmukler" wrote:
> > In the past, I've used webstone for some web performance benchmarking.
> > Recently, we've also been contacted with regard to a test suite named Web
> > Polygraph.
> 
> I probably was not 100% clear when I asked the question. Sorry. I am looking for
> test that would tell me how will server perform in the field on WAN.
> 
> Webstone is a bit naive about benchmarking. It allows comparison of server
> scores, however the fact that a particular OS/server combination scores higher
> does not guaranatee that it will have higher thoughput in real life situation.
> 
> I scanned briefly through Web Polygraph's documentation and it seems like more
> powerful tool, but I could not find how to emulate large delays and packet loss
> common for WANs.

That's what "dummynet" does for you: it lets you simulate lossy
traffic, etc.; I presonally don't find it to be that meaningful,
since the only thing that's actually going to see lossy traffic
is the L4/L7 load balancer sitting in front of your server farm,
so unless you are testing your load balancer, you really aren't
going to get much information out of a lossy network test.

Also, your most critical point of failure is going to be your
really slow client connections taking up all your sockets and thus
starving your faster client connections by being in the way.

If you care about comparison points, webstone and Microsoft's WAST
tool are going to be what people will be comparing you against when
they run their own tests on an evaluation.

http://www.mindcraft.com/webstone/
http://www.microsoft.com/technet/itsolutions/intranet/downloads/webstres.asp

The WAST application from Microsoft has the negative effect that it
tends to fill up your server with packets in FIN-WAIT-2 state.  This
is because it doesn't do the full handshake on shutdown of connections,
it just RST's them.  The problem with this approach is that if you lose
an RST packet, as opposed to a FIN, you aren't going to get a retransmit
in 2 MSL.  Julian Elisher did some patches for the TCP stack while at
Whistle that address this issue the right way, by pretending to not have
got the FIN that sent you from FIN-WAIT-1 to FIN-WAIT-2.  This is also
the same fix that Windows NT uses.  If you don't make this fix, then you
should expect that your server will "fill up" with idle connections when
you are running WAST, and eventually stop serving pages.

The primary value in Polygraph is stress-testing.  It's mostly for
proxy servers, and its main value lies in it setting up cache-busting
scenarios by pre-loading a "hot" cache to force the limitation to be
your proxy server, load balancer, or whatever.  Mostly as a statement
about the undesirability of such devices inre: the end-to-end nature
of the net "as it's supposed to be".

http://www.web-polygraph.org/

It's actually really amusing that Network Appliance "cheats" on the
Polygraph benchmarks with their caching proxy appliance by doing
random page replacement in order to defeat the Polygraph attempt to
defeat the proxy cache by guessing it's size, and then making the
workload such that there's 100% cache misses.

That you could have such a pessimal algorithm, and that that was
the best way to get a good score on the benchmark, to me, says a
lot about the actual value of the benchmark as a benchmark.

If you are just into stress testing, rather than running Polygraph,
you'd likely be better off running http_load on a bunch of boxes:

http://www.acme.com/software/http_load/

Or if you aren't adverse to a commercial box, you might want to
consider the Web Avalanche products from Spirent, which they got
in their acquisition of Caw Networks:

http://www.caw.com/

It's basically a single box that's the equivalent of a lab worth
of UNIX systems running a bunch of copies of http_load.  It's very
good at making web servers and proxies, etc., fall over dead if
they have any issues at all.

-- Terry