svn commit: r242847 - in head/sys: i386/include kern

Sun Nov 11 16:53:54 UTC 2012

I think there are two issue here. 

One: you have much better idea of how to tune nmbclusters than I do. Cool! Please put that into the code. I really think that's great and the time you've pit into giving it serious thought is helpful to all. 

Two: you want to divorce nmbclusters (and therefor maxsockets and some other tunables) from maxusers even though that has been the way to flip a big switch for ages now. This is think is very wrong. 

"oh you only have to change 1 thing!"

Wait... What was that sound?  Oh it was the flushing of a toilet that was flushing down 15 years of mailing list information, FAQs and user knowledge down the toilet because the word "maxusers" is no longer hip to the community. That is bad. Please don't do that. 

On Nov 11, 2012, at 2:53 AM, Peter Wemm <peter at wemm.org> wrote:

> On Sun, Nov 11, 2012 at 1:41 AM, Albert Perlstein <bright at mu.org> wrote:
>> The real conversation goes like this:
>> 
>> user: "Why is my box seeing terrible network performance?"
>> bsdguy: "Increase nmbclusters."
>> user: "what is that?"
>> bsdguy: "Oh those are the mbufs, just tell me your current value."
>> user: "oh it's like 128000"
>> bsdguy: "hmm try doubling that, go sysctl kern.ipc.nmbclusters=512000 on the
>> command line."
>> user: "ok"
>> .... an hour passes ...
>> user: "hmm now I can't fork any more copies of apache.."
>> bsdguy: "oh, ok, you need to increase maxproc for that."
>> user: "so sysctl kern.ipc.maxproc=10000?"
>> bsdguy: "no... one second..."
>> ....
>> bsdguy: "ok, so that's sysctl kern.maxproc=10000"
>> user: "ok... bbiaf"
>> ....
>> user: "so now i'm getting log messages about can't open sockets..."
>> bsdguy: "oh you need to increase sockets bro... one second..."
>> user: "sysctl kern.maxsockets?"
>> bsdguy: "oh no.. it's actually back to kern.ipc.maxsockets"
>> user: "alrighty then.."
>> ....
>> ....
>> bsdguy: "so how is freebsd since I helped you tune it?"
>> user: "well i kept hitting other resource limits, boss made me switch to
>> Linux, it works out of the box and doesn't require an expert tuner to run a
>> large scale server.  Y'know as a last ditch effort I looked around for this
>> 'maxusers' thing but it seems like some eggheads retired it and instead of
>> putting my job at risk, I just went with Linux, no one gets fired for using
>> Linux."
>> bsdguy: "managers are lame!"
>> user: "yeah!  managers..."
>> 
>> -Alfred
> 
> Now Albert.. I know that deliberately playing dumb is fun, but there
> is no network difference between doubling "kern.maxusers" in
> loader.conf (the only place it can be set, it isn't runtime tuneable)
> and doubling "kern.ipc.nmbclusters" in the same place.  We've always
> allowed people to fine-tune derived settings at runtime where it is
> possible.
> 
> My position still is that instead of trying to dick around with
> maxusers curve slopes to try and somehow get the scaling right, we
> should instead be setting sensibly right from the start, by default.
> 
> The current scaling was written when we had severe kva constraints,
> did reservations, etc.  Now they're a cap on dynamic allocators on
> most platforms.
> 
> "Sensible" defaults would be *way* higher than the current maxusers
> derived scaling curves.
> 
> My quick survey:
> 8G ram -> 65088 clusters -> clusters capped at 6.2% of physical ram
> (running head)
> 3.5G ram -> 25600 clusters -> clusters capped at 5.0% of physical ram
> (running an old head)
> 32G ram -> 25600 clusters -> clusters capped at 1.5% of physical ram
> (running 9.1-stable)
> 72G ram -> 25600 clusters -> clusters capped at 0.06% of physical ram
> (9.1-stable again)
> 
> As I've been saying from the beginning..  As these are limits on
> dynamic allocators, not reservations, they should be as high as we can
> comfortably set them without risking running out of other resources.
> 
> As the code stands now..  the derived limits for 4k, 9k and 16k jumbo
> clusters is approximately the same space as 2K clusters.  (ie: 1 x 4k
> cluster per 2 x 2k clusters, 1 x 16k cluster per 8 2k clusters, and so
> on).  If we set a constant 6% for nmbclusters (since that's roughly
> where we're at now for smaller machines after albert's changes), then
> the worse case scenarios for 4k, 9k and 16k clusters are 6% each.  ie:
> 24% of wired, physical ram.
> 
> Plus all the other values derived from the nmbclusters tunable at boot.
> 
> I started writing this with the intention of suggesting 10% but that
> might be a bit high given that:
> kern_mbuf.c:        nmbjumbop = nmbclusters / 2;
> kern_mbuf.c:        nmbjumbo9 = nmbclusters / 4;
> kern_mbuf.c:        nmbjumbo16 = nmbclusters / 8;
> .. basically quadruples the worst case limits.
> 
> Out of the box, 6% is infinitely better than we 0.06% we currently get
> on a 9-stable machine with 72G ram.
> 
> But I object to dicking around with "maxusers" to derive network
> buffer space default limits.  If we settle on something like 6%, then
> it should be 6%.  That's easy to document and explain the meaning of
> the tunable.
> -- 
> Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com; KI6FJV
> "All of this is for nothing if we don't go to the stars" - JMS/B5
> "If Java had true garbage collection, most programs would delete
> themselves upon execution." -- Robert Sewell