packet drop with intel gigabit / marwell gigabit
Jin Guojun [VFFS]
g_jin at lbl.gov
Wed Mar 22 04:26:57 UTC 2006
You are fast away from the real world. This has been explained million
times, just like
I teach intern student every summer :-)
First of all, DDR400 and 200 MHz bus mean nothing -- A DDR 266 + 500MHz
can over perform a DDR 400 + 1.7 GHz CPU system. Another example:
Ixxxx 2 CPU was designed with 3 level caches. Supposedly
Level 1 to level2 takes 5 cycles
Level 2 to level 3 takes 11 cycles
What you expect CPU to memory time (cycles) -- CPU to level-1 is one
you would expect 17 cycles to 20 cycles of total. But it actually
takes 210 cycles
due to some design issues.
Now your 1.6 GB/s reduced to 16MB/s or even worse just based on this
Number of other factors affect memory bandwidth, such as bus arbitration.
Have you done any memory benchmark on a system before doing such simple
Secondly, DMA moves data from NIC to mbuf, then who moves data from mbuf
to user buffer?
Not human. It is CPU. When DMA moving data, can CPU moves data
DMA takes both I/O bandwidth and memory bandwidth. If your system has
only 16 MB/s
memory bandwidth, your network throughput is less 8 MB/s, typically
below 6.4 MB/s.
If you cannot move data fast enough away from NIC, what happens?
That is why his CPU utilization was low because there was no much data
So, that is why I asked him what is the CPU utilization first, then the
chipset. This is
the basic steps to diagnose network performance.
If you know a CPU and chipset for a system, you will know the network
ceiling for that system, guaranteed. But it does not guarantee you can
get that ceiling
performance, especially over OC-12 (622 Mb/s) high-speed networks. That
intensive tuning knowledge for current TCP stack, which is well
explained on the Internet
by searching for "TCP tuning".
Gary Thorpe wrote:
> I thought all modern NICs used bus mastering DMA i.e. not dependent on
> CPU for data transfers? In addition, the available memory bandwidth
> for modern CPU's/systems is well over 100 MB/s. DDR400 is 400 MB/s
> (megabytes per second). Bus mastering DMA will be limited by the
> memory or IO bus bandwidth primarily. The system bus bandwidth cannot
> be the problem either: his motherboard's lowest front side bus speed
> is 200 MHz * 64-bit width = 1.6 GB/s (gigabytes per second) of peak
> system bus bandwidth.
>The limitation of 32-bit/33 MHz PCI is 133 MB/s (again, megabytes not bits) maximum. Gigabit ethernet requires 125 MB/s (not Mb/s) maximum bandwidth: 32/33 PCI has enough for bursts but bus contention with disk bandwidth will reduce the sustained bandwidth. The motherboard in question has an option for integrated gigabit LAN which may bypass the shared PCI bus altogether (or it might not).
>Anyway, the original problem was packet loss and not bandwidth. His CPU is mostly idle, so that cannot be the reason for packet loss. If 32/33 PCI can sustain 133 MB/s then it cannot be a problem because he needs
>less than this. If it cannot, then packets will arrive too fast from the network before they can be moved from the board into memory and would cause the packet loss. Otherwise, his system is capable of achieving what he wants in theory and the suboptimal behavior may be due to hardware (e.g. PCI bus bandwidth not being able to reach 133 MB/s sustained) or software limitations (e.g inefficient operating system).
More information about the freebsd-performance