packet drop with intel gigabit / marwell gigabit

Gary Thorpe gthorpe at myrealbox.com
Thu Mar 23 05:40:22 UTC 2006


[No subject in first one, sorry for repost]
 
Jin Guojun [VFFS] wrote:

> You are fast away from the real world. This has been explained million
> times, just like
> I teach intern student every summer :-)
>
> First of all, DDR400 and 200 MHz bus mean nothing -- A DDR 266 + 500MHz
> CPU system
> can over perform a DDR 400 + 1.7 GHz CPU system.

Given the same chipset+motherboard, no. DDR400 has more bandwidth and a smaller latency. Given different chipsets/motherboards, this may be true. However, one could also say with accuracy that a 500 Mhz processor can outperform the same family running at 1.7 GHz under some conditons but few people will run to buy 500 Mhz over 1.7 GHz for performance alone.

Another example:
> Ixxxx 2 CPU was designed with 3 level caches. Supposedly
> Level 1 to level2 takes 5 cycles
> Level 2 to level 3 takes 11 cycles
> What you expect CPU to memory time (cycles) -- CPU to level-1 is one
> cycle ?
> you would expect 17 cycles to 20 cycles of total. But it actually
> takes 210 cycles
> due to some design issues.
> Now your 1.6 GB/s reduced to 16MB/s or even worse just based on this
> factor.

1.6 Gb/s = system bus bandwidth. Cache won't affect this bandwidth. DDR400 has 400 MB/s: only attainable for long sequential accesses of either read or write but not a mix of both. DMA should be able to get near this limit (long and sequential, read or write only per transfer). A NIC with bus mastering DMA should be able to effectively use the memory bandwidth.

> Number of other factors affect memory bandwidth, such as bus arbitration.
> Have you done any memory benchmark on a system before doing such simple
> calculation?

No, they are just theoretical values telling you the limits of performance. I asume that a decent implementation can get 75% of the theoretical limit at least some of the time under good conditions (like DMA).

>
> Secondly, DMA moves data from NIC to mbuf, then who moves data from mbuf
> to user buffer?
> Not human. It is CPU. When DMA moving data, can CPU moves data
> simultaneously?
> DMA takes both I/O bandwidth and memory bandwidth. If your system has
> only 16 MB/s
> memory bandwidth, your network throughput is less 8 MB/s, typically
> below 6.4 MB/s.
> If you cannot move data fast enough away from NIC, what happens?
> packet loss!

True, but would this type of packet loss even be measured by the OS? Packet loss to the OS means some packets were dropped from the software portion of the network stack right? This means that the NIC has no problems delivering it to the OS and the OS has problems delivering it to the user process.

You are arguing that the bandwidth is not sufficient for the processor to do this copy out (or page loan out = zero copy, only memory management tricks) and the software has to drop packets from mbufs when more packets arrive for UDP. Enough bandwidth is theoretically available for this (much more than required), it may or may not be true that the actual sustained bandwidth is insufficient. I don't think that 1/4 of the bandwidth is actually available for any reasonable (i.e. not junk) system.

>
> That is why his CPU utilization was low because there was no much data
> cross CPU.
> So, that is why I asked him what is the CPU utilization first, then the
> chipset. This is
> the basic steps to diagnose network performance.
> If you know a CPU and chipset for a system, you will know the network
> performance
> ceiling for that system, guaranteed. But it does not guarantee you can
> get that ceiling
> performance, especially over OC-12 (622 Mb/s) high-speed networks. That
> requires
> intensive tuning knowledge for current TCP stack, which is well
> explained on the Internet
> by searching for "TCP tuning".

In this case, bandwidth should not factor in (16 MB/s is low, disks can regularly double this easily). The 1 Gb/s NIC is not being fully used in this case (< 40 MB/s) and the processor is mostly idle.




More information about the freebsd-performance mailing list