nmbclusters: how do we want to fix this for 8.3 ?

Sat Mar 24 21:18:04 UTC 2012

This whole issue only came up on a system with 10G devices, and only igb
does anything like you're talking about, not a device/driver on most low end
systems. So, we are trading red herrings it would seem.

I'm not opposed to economizing things in a sensible way, it was I that
brought
the issue up after all :)

Jack

On Sat, Mar 24, 2012 at 2:02 PM, Juli Mallett <jmallett at freebsd.org> wrote:

> On Sat, Mar 24, 2012 at 13:33, Jack Vogel <jfvogel at gmail.com> wrote:
> > On Sat, Mar 24, 2012 at 1:08 PM, John-Mark Gurney <jmg at funkthat.com>
> wrote:
> >> If we had some sort of tuning algorithm that would keep track of the
> >> current receive queue usage depth, and always keep enough mbufs on the
> >> queue to handle the largest expected burst of packets (either
> historical,
> >> or by looking at largest tcp window size, etc), this would both improve
> >> memory usage, and in general reduce the number of require mbufs on the
> >> system...  If you have fast processors, you might be able to get away
> with
> >> less mbufs since you can drain the receive queue faster, but on slower
> >> systems, you would use more mbufs.
> >
> > These are the days when machines might have 64 GIGABYTES of main storage,
> > so having sufficient memory to run high performance networking seems
> little
> > to
> > ask.
>
> I think the suggestion is that this should be configurable.  FreeBSD
> is also being used on systems, in production, doing networking-related
> tasks, with <128MB of RAM.  And it works fine, more or less.
>
> >> This tuning would also fix the problem of interfaces not coming up since
> >> at boot, each interface might only allocate 128 or so mbufs, and then
> >> dynamicly grow as necessary...
> >
> > You want modern fast networked servers but only giving them 128 mbufs,
> > ya right , allocating memory takes time, so when you do this people will
> > whine about latency :)
>
> Allocating memory doesn't have to take much time.  A multi-queue
> driver could steal mbufs from an underutilized queue.  It could grow
> the number of descriptors based on load.  Some of those things are
> hard to implement in the first place and harder to cover the corner
> cases of, but not all.
>
> > When you start pumping 10G...40G...100G ...the scale of the system
> > is different, thinking in terms of the old 10Mb or 100Mb days just
> doesn't
> > work.
>
> This is a red herring.  Yes, some systems need to do 40/100G.  They
> require special tuning.  The default shouldn't assume that everyone's
> getting maximum pps.  This seems an especially-silly argument when
> much of the silicon available can't even keep up with maximum packet
> rates with minimally-sized frames, at 10G or even at 1G.
>
> But again, 1G NICs are the default now.  Does every FreeBSD system
> with a 1G NIC have loads of memory?  No.  I have an Atheros system
> with 2 1G NICs and 256MB of RAM.  It can't do anything at 1gbps.  Not
> even drop packets.  Why should its memory usage model be tuned for
> something it can't do?
>
> I'm not saying it should be impossible to allocate a bajillion
> gigaquads of memory to receive rings, I certainly do it myself all the
> time.  But choosing defaults is a tricky thing, and systems that are
> "pumping 10G" need other tweaks anyway, whether that's enabling
> forwarding or something else.  Because they have to be configured for
> the task that they are to do.  If part of that is increasing the
> number of receive descriptors (as the Intel drivers already allow us
> to do — thanks, Jack) and the number of queues, is that such a bad
> thing?  I really don't think it makes sense for my 8-core system or my
> 16-core system to come up with 8- or 16-queues *per interface*.  That
> just doesn't make sense.  8/N or 16/N queues where N is the number of
> interfaces makes more sense under heavy load.  1 queue per port is
> *ideal* if a single core can handle the load of that interface.
>
> > Sorry but the direction is to scale everything, not pare back on the
> network
> > IMHO.
>
> There is not just one direction.  There is not just one point of
> scaling.  Relatively-new defaults do not necessarily have to be
> increased in the future.  I mean, should a 1G NIC use 64 queues on a
> 64-core system that can do 100gbps @ 64 bytes on one core?  It's
> actively-harmful to performance.  The answer to "what's the most
> sensible default?" is not "what does a system that just forwards
> packets need?"  A system that just forwards packets already needs IPs
> configured and a sysctl set.  If we make it easier to change the
> tuning of the system for that scenario, then nobody's going to care
> what our defaults are, or think us "slow" for them.
>