svn commit: r242910 - in user/andre/tcp_workqueue/sys: kern sys

Wed Nov 14 02:10:21 UTC 2012

Andre, do you think the variable "realmem" could be exported as 
something like kmemsize or something?

Or maybe a function call to subr_param.c?

The reason I ask is that I would like to scale things like number of 
default sysv semaphores to something like 64 per 1GB of "realmem".

Thoughts?

The reason I'm interested in this is because we just had a user run out 
of sysvsems on a machine with 256GB ram.

-Alfred

On 11/12/12 2:49 AM, Andre Oppermann wrote:
> On 12.11.2012 09:47, Andre Oppermann wrote:
>> Author: andre
>> Date: Mon Nov 12 08:47:13 2012
>> New Revision: 242910
>> URL: http://svnweb.freebsd.org/changeset/base/242910
>>
>> Log:
>>    Base the mbuf related limits on the available physical memory or
>>    kernel memory, whichever is lower.
>
> The commit message is a bit terse so I'm going to explain in more
> detail:
>
> The overall mbuf related memory limit must be set so that mbufs
> (and clusters of various sizes) can't exhaust physical RAM or KVM.
>
> I've chosen a limit of half the physical RAM or KVM (whichever is
> lower) as the baseline.  In any normal scenario we want to leave
> at least half of the physmem/kvm for other kernel functions and
> userspace to prevent it from swapping like hell.  Via a tunable
> it can be upped to at most 3/4 of physmem/kvm.
>
> Out of the overall mbuf memory limit I've chosen 2K clusters, 4K
> (page size) clusters to get 1/4 each because these are the most
> heavily used mbuf sizes.  2K clusters are used for MTU 1500 ethernet
> inbound packets.  4K clusters are used whenever possible for sends
> on sockets and thus outbound packets.
>
> The larger cluster sizes of 9K and 16K are limited to 1/6 of the
> overall mbuf memory limit.  Again, when jumbo MTU's are used these
> large clusters will end up only on the inbound path.  They are not
> used on outbound, there it's still 4K.  Yes, that will stay that
> way because otherwise we run into lots of complications in the
> stack.  And it really isn't a problem, so don't make a scene.
>
> Previously the normal mbufs (256B) weren't limited at all.  This
> is wrong as there are certain places in the kernel that on allocation
> failure of clusters try to piece together their packet from smaller
> mbufs.  The mbuf limit is the number of all other mbuf sizes together
> plus some more to allow for standalone mbufs (ACK for example) and
> to send off a copy of a cluster.  FYI: Every cluster eventually also
> has an mbuf associated with it.
>
> Unfortunately there isn't a way to set an overall limit for all
> mbuf memory together as UMA doesn't support such a limiting.
>
> Lets work out a few examples on sizing:
>
> 1GB KVM:
>  512MB limit for mbufs
>  419,430 mbufs
>   65,536 2K mbuf clusters
>   32,768 4K mbuf clusters
>    9,709 9K mbuf clusters
>    5,461 16K mbuf clusters
>
> 16GB RAM:
>  8GB limit for mbufs
>  33,554,432 mbufs
>   1,048,576 2K mbuf clusters
>     524,288 4K mbuf clusters
>     155,344 9K mbuf clusters
>      87,381 16K mbuf clusters
>
> These defaults should be sufficient for event the most demanding
> network loads.  If you do run into these limits you probably know
> exactly what you are doing and you are expected to tune those
> values for your particular purpose.
>
> There is a side-issue with maxfiles as it relates to the maximum
> number of sockets that can be opened at the same time.  With web
> servers and proxy caches of these days there may be some 100K or
> more sockets open.  Hence I've divorced maxfiles from maxusers as
> well.  There is a relationship of maxfiles with the callout callwheel
> though which has to be investigated some more to prevent ridiculous
> values from being chosen.
>