Updating our TCP and socket sysctl values...

Sun Mar 20 17:49:52 UTC 2011

On Sat, Mar 19, 2011 at 10:47 PM, George Neville-Neil
<gnn at neville-neil.com>wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> On Mar 20, 2011, at 08:13 , Navdeep Parhar wrote:
>
> > On Fri, Mar 18, 2011 at 11:37 PM, George Neville-Neil
> > <gnn at neville-neil.com> wrote:
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> Howdy,
> >>
> >> I believe it's time for us to upgrade our sysctl values for TCP sockets
> so that
> >> they are more in line with the modern world.  At the moment we have
> these limits on
> >> our buffering:
> >>
> >> kern.ipc.maxsockbuf: 262144
> >> net.inet.tcp.recvbuf_max: 262144
> >> net.inet.tcp.sendbuf_max: 262144
> >>
> >> I believe it's time to up these values to something that's in line with
> higher speed
> >> local networks, such as 10G.  Perhaps it's time to move these to 2MB
> instead of 256K.
> >>
> >> Thoughts?
> >
> > 256KB seems adequate for 10G (as long as the consumer can keep
> > draining the socket rcv buffer fast enough).  If you consider 2 x
> > bandwidth delay product to be a reasonable socket buffer size then
> > 256K allows for 10G networks with ~100ms delays.  Normally the delay
> > is _way_ less than this for 10G and even 256K may be an overkill (but
> > this is ok, the kernel has tcp_do_autorcvbuf on by default)
> >
> > While we're here discussing defaults, what about nmbclusters and
> > nmbjumboXX?  Now those haven't kept up with modern machines (imho).
> >
>
> Yes we should also up the nmbclusters, IMHO, but I wasn't going to
> put that in the same bucket with the TCP buffers just yet.
> On 64 bit/large memory machines you could make the nmbclusters
> far higher than our current default.  I know people who just set
> that to 1,000,000 by default.
>
> If people are also happy to up nmbclusters I'm willing to conflate
> that with this.
>
>
A more modest but nonetheless significant increase could also be possible on
i386 machines.  If you go back to r129906, wherein we switched to using UMA
for allocating mbufs and mbuf clusters, and read it carefully, you'll find
that there was a subtle mistake made in the changes to the sizing of the
kmem_map, or the "kernel heap".  Prior to r129906, the overall size of the
kmem map was based on the limits on mbufs and mbuf clusters PLUS the amount
of kernel heap that was desired for everything else.  After r129906, the
limits on mbufs and mbuf clusters no longer made any difference to the size
of the kmem map.  The reason being that the limit on mbuf clusters was
factored into the autosizing too early.  It is added to the minimum "kernel
heap" size, not the desired size.  So, the end result is that mbufs, mbuf
clusters, and everything else were made to compete for a smaller kmem map.
In short, r129906 should have increased VM_KMEM_SIZE_MAX from its current
limit of 320MB.

I'd be curious if people running i386-based network servers have any
problems with using

#ifndef VM_KMEM_SIZE_MAX
#define    VM_KMEM_SIZE_MAX    ((VM_MAX_KERNEL_ADDRESS - \
    VM_MIN_KERNEL_ADDRESS + 1) * 3 / 5)
#endif

in place of

#ifndef VM_KMEM_SIZE_MAX
#define    VM_KMEM_SIZE_MAX    (320 * 1024 * 1024)
#endif

Really, the only downside to this change is that it reduces the available
kernel virtual address space for thread stacks and 9 and 16KB  jumbo frames.

Alan