Frequent hickups on the networking layer

Wed Apr 29 05:08:04 UTC 2015

<<On Tue, 28 Apr 2015 17:06:02 -0400 (EDT), Rick Macklem <rmacklem at uoguelph.ca> said:

> There have been email list threads discussing how allocating 9K jumbo
> mbufs will fragment the KVM (kernel virtual memory) used for mbuf
> cluster allocation and cause grief.

The problem is not KVA fragmentation -- the clusters come from a
separate map which should prevent that -- it's that clusters have to
be physically contiguous, and an active machine is going to have
trouble with that.  The fact that 9k is a goofy size (two pages plus a
little bit) doesn't help matters.

The other side, as Neel and others have pointed out, is that it's
beneficial for the hardware to have a big chunk of physically
contiguous memory to dump packets into, especially with various kinds
of receive-side offloading.

I see two solutions to this, but don't have the time or resources (or,
frankly, the need) to implement them (and both are probably required
for different situations):

1) Reserve a big chunk of physical memory early on for big clusters.
How much this needs to be will depend on the application and the
particular network interface hardware, but you should be thinking in
terms of megabytes or (on a big server) gigabytes.  Big enough to be
mapped as superpages on hardware where that's beneficial.  If you have
aggressive LRO, "big clusters" might be 64k or larger in size.

2) Use the IOMMU -- if it's available, which it won't be when running
under a hypervisor that's already using it for passthrough -- to
obviate the need for physically contiguous pages; then the problem
reduces to KVA fragmentation, which is easier to avoid in the
allocator.

> As far as I know (just from email discussion, never used them myself),
> you can either stop using jumbo packets or switch to a different net
> interface that doesn't allocate 9K jumbo mbufs (doing the receives of
> jumbo packets into a list of smaller mbuf clusters).

Or just hack the driver to not use them.  For the Intel drivers this
is easy, and at least for the hardware I have there's no benefit to
using 9k clusters over 4k; for Chelsio it's quite a bit harder.

-GAWollman