What version of FBSD does Yahoo run?
TM4525 at aol.com
TM4525 at aol.com
Thu Oct 7 14:35:28 PDT 2004
In a message dated 10/7/04 4:06:34 PM Eastern Daylight Time, drosih at rpi.edu
writes:
Here's one benchmark, showing UDP packet/second generation
rate from userland on a dual xeon machine under various
target loads:
Desired Optimal 5.x-UP 5.x-SMP 4.x-UP 4.x-SMP
50000 50000 50000 50000 50000 50000
75000 75000 75001 75001 75001 75001
100000 100000 100000 100000 100000 100000
125000 125000 125000 125000 125000 125000
150000 150000 150015 150014 150015 150015
175000 175000 175008 175008 175008 169097
200000 200000 200000 179621 181445 169451
225000 225000 225022 179729 181367 169831
250000 250000 242742 179979 181138 169212
275000 275000 242102 180171 181134 169283
300000 300000 242213 179157 181098 169355
That does show results for both single-processor (5.x-UP 4.x-UP)
and multi- processor (5.x-SMP, 4.x-SMP) benchmarks. It may be
that he ignored the table as soon as he read "dual Xeon".
--------------------------------------------
I haven't seen this before. If I did, I would immediately ask:
- What is the control here? What does your "benchmark" test?
- Is this on a gigabit link? What are the packet sizes? Was network
availability a factor in limiting the test results?
- What does "target load" mean? Does it mean don't try to send
more than that? If so, what does it show if you reach it? If you
don't measure the utilization that it takes to saturate your "target"
I don't see the point of having it.
- It seems that the only thing you could learn from this test would
be what is the maximum pps
you could achieve unidirectionally out of a system. Why is that
useful, since its hardly ever the requirement unless you're
building a traffic generator?
- a relatively slow machine (a 1.7Ghz celeron with a 32-bit/33mhz
fxp NIC running 4.9) pushes over 250Kpps, so why is your machine,
with seemingly superior hardware, so slow?
- the test seems backwards. What you are doing in this test is
not something that any device does. If you want to measure user-space
performance, it has to include receive and transmit response, not
just transmit. Perhaps it indirectly shows process-switching performance,
but doesn't tell you very much about network performance, since transmit
is much more trivial than receive in terms of processing requirements.
When you transmit you know exactly what you have, when you receive
you have to do a lot of checking and testing to see what needs to
be done.
When I test network performance, I want to isoloate kernel
performance if possible. If you're evaluating the system for use as
a network device (such as a router, a bridge, a firewall, etc), you
have to eliminate userland from the formula. The interaction between
user space and the kernel is a key factor in your "benchmark" that is absent
in a pure network device, so its not useful in testing pure stack
performance.
Also, there is a significant problem with "maximum packets/second" tests.
As you reach high levels of saturation, you often get abnormal processing
requirements that skew the results. For example as you get higher and
higher bus saturations the processing requirements change, as I/Os take
longer waiting for access to the bus, transmit queues may fill, etc.
Testing under such unusual conditions may inlcude abnormal recovery
code to handle such saturations that would never occur with a machine
under "normal" loads.
A better way to test is measuring utilization under realistically normal
conditions. Machines can get very inefficient if their recovery code is poor,
but it may not matter since no-one realistically runs a machine at 98%
utilization.
Assuming that your benchmark does test something, Your "results"
seem to show that a uniprocessor machine is substantially
more efficient than an SMP box. It also seems that the gap has widened
between UP and SMP performance in 5.x. Wasn't one of the goals
of 5.x to substantially improve SMP performance? This seems to show
the opposite.
TM
More information about the freebsd-questions
mailing list