nfsd server cache flooded, try to increase nfsrc_floodlevel

Fri Jul 25 10:38:33 UTC 2014

Harald Schmalzbauer wrote:
> Bezüglich Rick Macklem's Nachricht vom 25.07.2014 02:14 (localtime):
> > Harald Schmalzbauer wrote:
> >> Bezüglich Rick Macklem's Nachricht vom 08.08.2013 14:20
> >> (localtime):
> >>> Lars Eggert wrote:
> >>>> Hi,
> >>>>
> >>>> every few days or so, my -STABLE NFS server (v3 and v4) gets
> >>>> wedged
> >>>> with a ton of messages about "nfsd server cache flooded, try to
> >>>> increase nfsrc_floodlevel" in the log, and nfsstat shows TCPPeak
> >>>> at
> >>>> 16385. It requires a reboot to unwedge, restarting the server
> >>>> does
> >>>> not help.
> >>>>
> >>>> The clients are (mostly) six -CURRENT nfsv4 boxes that netboot
> >>>> from
> >>>> the server and mount all drives from there.
> >>>>
> > Have you tried increasing vfs.nfsd.tcphighwater?
> > This needs to be increased to increase the flood level above 16384.
> >
> > Garrett Wollman sets:
> > vfs.nfsd.tcphighwater=100000
> > vfs.nfsd.tcpcachetimeo=300
> >
> > or something like that, if I recall correctly.
> 
> Thanks you for your help!
> 
> I read about tuning these sysctls, but I object individually altering
> these, because I don't have hundreds of clients torturing a poor
> server
> or any other not well balanced setup.
> I run into this problem with one client, connected via 1GbE (not 10
> or
> 40GbE) link, talking to modern server with 10G RAM - and this
> environment forces me to reboot the storage server every 2nd day.
> IMHO such a setup shouldn't require manual tuning and I consider this
> as
> a really urgent problem!
Btw, what you can do to help with this is experiment with the tunable
and if you find a setting that works well for your server, report that
back as a data point that can be used for this.

If you make it too large, the server runs out of address space that
can be used by malloc() and that results in the whole machine being
wedged and not just the NFS server.

> Whatever causes the server to lock up is strongly required to be
> fixed
> for next release,
> otherwise the shipped implementation of NFS is not really suitable
> for
> production environment and needs a warning message when enabled.
> The impact of this failure forces admins to change the operation
> system
> in order to get a core service back into operation.
> The importance is, that I don't suffer from weaker performance or
> lags/delays, but my server stops NFS completely and only a reboot
> solves
> this situation.
> 
> Are there later modifcations or other findings which are known to
> obsolete
> your noopen.patch (http://people.freebsd.org/~rmacklem/noopen.patch)?
> 
> I'm testing this atm, but having other panics on the same machine
> related to vfs locking, so results of the test won't be available too
> soon.
> 
> Thank you,
> 
> -Harry
> 
> 
>