vnode leak in NFS (Was: Re: 4.11-STABLE leaks vnodes worse then 4.x from Feb 13th ... ?)

Marc G. Fournier scrappy at hub.org
Sun Jul 17 15:10:37 GMT 2005


Wow, now this was unexpected ... figured I try a quick theory this morning 
... debug.freevnodes was down to:

debug.freevnodes: 103987

umount /nfs and ...

# sysctl debug.freevnodes
debug.freevnodes: 332106


The vnode leak isn't in the unionfs code ... its in the nfs code :(


On Fri, 15 Jul 2005, Marc G. Fournier wrote:

>
> Recently, I started having problems with one of my newest servers ... 
> figuring that it might have somethign to do with the fact that I went SATA 
> for this one (all others are SCSI), I figured it might be a driver issue 
> causing the problems, since everything else is the same as the other 5 
> servers on our network ...
>
> Today, I'm starting to wonder if I've been just looking at the "most obvious" 
> cause, instead of looking deeper ...
>
> The problem that manifests itself is similar to the old 'ran out of vnode' 
> issue I used to experience under 4.x ... the server would still run, be 
> totally pingable, and you could even get the motd when you tried to ssh in, 
> but you couldn't get a prompt, and all processes were hung ...
>
> I just upgraded the kernel on this machine (mercury) on the 13th of July, and 
> its been running 1 day, 12 hrs now ... there is hardly anything running on 
> this machine (10 jails), and vnode usage is:
>
> debug.numvnodes: 336460 - debug.freevnodes: 5275 - debug.vnlru_nowhere: 0 - 
> vlruwt
>
> One of my older servers (neptune), running kernels from Feb 13th of this 
> year, and with 81 jails running on it, is using up *significantly less* 
> vnodes (uptime: 1 day, 10 hours):
>
> debug.numvnodes: 279710 - debug.freevnodes: 91442 - debug.vnlru_nowhere: 0 - 
> vlruwt
>
> Now, compared to neptune, mercury isn't running anything special ... several 
> apache 1 processes, postfix, cyrus-imapd and that's it ... neptune on the 
> other hand, is running the full gambit ... aolserver, java, apache 1 and 2, 
> postfix, etc ...
>
> So, I'm starting to think that the problem isn't "hardware related", but the 
> kernel itself ... the latest 4.11-STABLE kernel seems to have brought in new 
> vnode leakage, or ... vnlru isn't working as it should be to free up vnodes 
> ...
>
> Looking at that process on mercury:
>
> # ps aux | grep vnlru
> root        7  0.0  0.0     0    0  ??  DL   Wed11PM   0:00.65  (vnlru)
>
> whereas on neptune:
>
> # ps aux | grep vnlru
> root          9  0.0  0.0     0    0  ??  DL   Thu01AM   0:00.79  (vnlru)
>
> so about the same about of CPU time being expended ... a bit more on the more 
> loaded server, but not a major amount ...
>
> I'd like to try and debug this, but don't know where to start ... I realize 
> that 4.x isn't being pushed anymore, but there are alot of us that haven't 
> moved to 5.x yet (am working on that for our next server, but its going to 
> take me several months before I can convert all our existing servers up) ...
>
> I do have a serial console on this server, if that helps to debug things ...
>
> I've heard that there was some work done on 5.x to clean up some of the vnode 
> leaks ... not sure if that is fact or just rumor ... but, if so, would any of 
> them be MFCable to 4.x?
>
> Thanks ...
>
> ----
> Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
> Email: scrappy at hub.org           Yahoo!: yscrappy              ICQ: 7615664
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy at hub.org           Yahoo!: yscrappy              ICQ: 7615664


More information about the freebsd-stable mailing list