CPU utilisation cap?

Mon Oct 25 06:56:04 PDT 2004

On Mon, 25 Oct 2004, Robert Watson wrote:
> A couple of thoughts, none of which points at any particular red flag, but
> worth thinking about:
>
> - You indicate their are multiple if_em cards in the host -- can you
>  describe the network topology?  Are you using multiple cards, or just
>  one of the nicely equipped ones?  Is there a switch involved, or direct
>  back-to-back wires?

I have 4 linux boxes (don't blame me!) generating udp traffic through an 
8-port HP procurve gigabit switch. The software I am using is ipbench (see 
ipbench.sourceforge.net for details).

I am presently only using one NIC at a time, though I intend to measure 
routing performance soon which will eliminate the user component (and 
hopefully some possible scheduler effects), and might shed more light on 
things.

> - Are the packet sources generating the packets synchronously or
>  asynchronously: i.e., when a packet source sends a UDP packet, does it
>  wait for the response before continuing, or keep on sending?   If
>  synchronously, are you sure that the wires are being kept busy?

It is not a ping-pong benchmark. The traffic is generated continuously, 
each packet is timestamped as it is sent, and the timestamp is compared 
with the receipt time to get a round trip time (I didn't bother including 
the latency information in my pust to freebsd-net).

> - Make sure your math on PCI bus bandwidth accounts for packets going in
>  both directions if you're actually echoing the packets.  Also make sure
>  to include the size of the ethernet frame and any other headers.

The values I have quoted (550Mbit etc.) are throughput for what is 
received at the linux box after echoing. Therefore we can expect double 
that on the PCI bus, plus overheads.

> - If you're using SCHED_ULE, be aware that it's notion of "nice" is a
>  little different from the traditional UNIX notion, and attempts to
>  provide more proportional CPU allocation.  You might try switching to
>  SCHED_4BSD.  Note that there have been pretty large scheduler changes in
>  5.3, with a number of the features that were previously specific to
>  SCHED_ULE being made available with SCHED_4BSD, and that a lot of
>  scheduling bugs have been fixed.  If you move to 5.3, make sure you run
>  with 4BSD, and it would be worth trying it with 5.2 to "see what
>  happens".

I very strongly agree that it sounds like a scheduling effect. The 5.2.1 
kernel (which is what I am using) is built with SCHED_4BSD already. It 
will be interesting to see if the new scheduler makes a difference.

> - It would be worth trying the test without the soaker process but instead
>  a sampling process that polls the kernel's notion of CPU% measurement
>  every second.  That way if it does turn out that ULE is unecessarily
>  giving CPU cycles to the soaker, you can still measure w/o "soaking".
>
> - What does your soaker do -- in particular, does it make system calls to
>  determine the time frequently?  If so, the synchronization operations
>  and scheduling cost associated with that may impact your measurements.
>  If it just spins reading the tsc and outputting once in a while, you
>  should be OK WRT this point.

I will look into this. I didn't write the code so I'm not sure what it 
does exactly. From what I understand it uses a calibrated tight loop, so 
it shouldn't need to do any syscalls while it is running, but I will check 
it out anyway. I have been considering implementing this using a cycle 
count register, but have avoided it for portability so far.

> - Could you confirm using netstat -s statistics that a lot of your packets
>  aren't getting dropped due to full buffers on either send or receive.
>  Also, do you have any tests in place to measure packet loss?  Can you
>  confirm that all the packets you send from the Linux boxes are really
>  sent, and that given they are sent, that they arrive, and vice versa on
>  the echo?  Adding sequence numbers and measuring the mean sequence
>  number difference might be an easy way to start if you aren't already.

I get numbers for both the packets transmitted and the packets received 
(albeit in terms of throughputs). What I see is little-to-no packet loss 
below the MLFRR (maximum loss free receive rate), and obviously packets 
get lost after that.

Thanks for the help!

-- 
Luke