Limits on jumbo mbuf cluster allocation

Garrett Wollman wollman at hergotha.csail.mit.edu
Sat Mar 9 18:46:11 UTC 2013


In article <20795.29370.194678.963351 at hergotha.csail.mit.edu>, I wrote:
><<On Sat, 9 Mar 2013 11:50:30 -0500 (EST), Rick Macklem
><rmacklem at uoguelph.ca> said:
>> I've thought about this. My concern is that the separate thread might
>> not keep up with the trimming demand. If that occurred, the cache would
>> grow veryyy laarrggge, with effects like running out of mbuf clusters.
>
>At a minimum, once one nfsd thread is committed to doing the cache
>trim, a flag should be set to discourage other threads from trying to
>do it.  Having them all spinning their wheels punishes the clients
>much too much.

Also, it occurs to me that this strategy is subject to livelock.  To
put backpressure on the clients, it is far better to get them to stop
sending (by advertising a small receive window) than to accept their
traffic but queue it for a long time.  By the time the NFS code gets
an RPC, the system has already invested so much into it that it should
be processed as quickly as possible, and this strategy essentially
guarantees[1] that, once those 2 MB socket buffers start to fill up, they
will stay filled, sending latency through the roof.  If nfsd didn't
override the usual socket-buffer sizing mechanisms, then sysadmins
could limit the buffers to ensure a stable response time.

The bandwidth-delay product in our network is somewhere between 12.5
kB and 125 kB, depending on how the client is connected and what sort
of latency they experience.  The usual theory would suggest that
socket buffers should be no more than twice that -- i.e., about 256
kB.

I'd actually like to see something like WFQ in the NFS server to allow
me to limit the amount of damage one client or group of clients can
do without unnecessarily limiting other clients.

-GAWollman

[1] The largest RPC is a bit more than 64 KiB (negotiated), so if the
server gets slow, the 2 MB receive queue will be refilled by the
client before the server manages to perform the RPC and send a
response.


More information about the freebsd-net mailing list