vnodes - is there a leak? where are they going?

Wed Sep 1 17:12:59 PDT 2004

On Wed, Sep 01, 2004 at 06:53:47PM -0300, Marc G. Fournier wrote:
> On Wed, 1 Sep 2004, Allan Fields wrote:
> 
> >On Tue, Aug 31, 2004 at 09:21:09PM -0300, Marc G. Fournier wrote:
> >>
> >>I have two servers, both running 4.10 of within a few days (Aug 5 for
> >>venus, Aug 7 for neptune) ... both running jail environments ... one with
> >>~60 running, the other with ~80 ... the one with 60 has been running for
> >>~25 days now, and is at the border of running out of vnodes:
> >>
> >>Aug 31 20:58:00 venus root: debug.numvnodes: 519920 - debug.freevnodes:
> >>11058 - debug.vnlru_nowhere: 256463 - vlrup
> >>Aug 31 20:59:01 venus root: debug.numvnodes: 519920 - debug.freevnodes:
> >>13155 - debug.vnlru_nowhere: 256482 - vlrup
> >>Aug 31 21:00:03 venus root: debug.numvnodes: 519920 - debug.freevnodes:
> >>13092 - debug.vnlru_nowhere: 256482 - vlruwt
> >>
> >>[..]
> >>
> >>I've tried shutting down all of the VMs on venus, and umount'd all of the
> >>unionfs mounts, as well as the one nfs mount we have ... the above #s are
> >>after the VMs (and mounts are recreated ...
> >>
> >>Now, my understanding of the vnodes is that for every file opened, a vnode
> >>is created ... in my case, since I'm using unionfs, there are two vnodes
> >>per file ... if it possible that there are 'stale' vnodes that aren't
> >>being freed up?  Is there some way of 'viewing' the vnode structure?
> >>
> >>For instance, fstat shows:
> >>
> >>venus# fstat | wc -l
> >>   19531
> >
> >You can also try pstat -f|more from the user side.
> 
> Even less:
> 
> venus# fstat | wc -l; pstat -f | wc -l
>    20930
>     6555
> 
> >You might want to setup for remote kernel debugging and peek around the 
> >system / further examine vnode structures.  (If you have physical access 
> >to two machines you can setup a null modem cable.)
> 
> Unfortunately, I'm working with a remote server here, so am quite limited 
> right now in what I can do ... anything I can, I will though ...

It's been suggested before [ http://www.freebsddiary.org/serial-console.php ]
that for management purposes, if two machines are in the same
facility, ask them to run a null-modem cable (or even better two)
between the serial ports.  One could be used for remote console and
the other remote debugging (com2).  (There are also ways to switch
between ddb and kgdb on the same line.)

Then you need to update the flags in /boot/loader.conf or device.hints.

> >>So, where else are the vnodes going?  Is there a 'leak'?  What can I look
> >>at to try and narrow this down / provide more information?
> >
> >If the use count isn't decremented (to zero) vnodes wont
> >be placed on the freelist.  Perhaps something isn't
> >calling vrele() where it should in unionfs?  You should check the
> >reference counts: v_usecount and v_holdcnt on some of the suspect
> >vnodes.
> 
> How do I do that?  I'm at the limit of my current knowledge right now ... 
> willing to do the foot work, just don't know the directions to take from 
> here :(

One of the first debugging steps might be to try to reproduce the
behaviour on a non-production machine somehow and figure out which
code is causing problems.  If you can afford to mess around w/ this
machine you should set a break-point on vnlru_proc() and step through
to see what is causing vnlrureclaim() to fail so done==0 and vnlru_nowhere++
(which is happening a lot for you.)

As commented in the source this could be buffer cache and/or namei
issues.

sys/kern/vfs_subr.c:
164 static int vnlru_nowhere = 0;
165 SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLAG_RW, &vnlru_nowhere, 0,
166     "Number of times the vnlru process ran without success");
..
530 static void
531 vnlru_proc(void)
532 {
..
563                 if (done == 0) {
564                         vnlru_nowhere++;
565                         tsleep(vnlruproc, PPAUSE, "vlrup", hz * 3);
566                 }

# ps -auwx|grep -v grep|grep vnlru:
root        10  0.0  0.0     0    0  ??  DL   17Jun04   1:43.81  (vnlru)

admin handbook:
10.28. What is vnlru?

vnlru flushes and frees vnodes when the system hits the kern.maxvnodes limit. This kernel thread sits mostly idle, and only activates if you have a huge amount of RAM and are accessing tens of thousands of tiny files.

If this was a filedesc leak this would be much easier as it's all in
userspace, but vnode stuff is all on the kernel side.  At this point
it's a guess as to whether there are problems w/ locking or something
else in VFS code or a specific file system.  I doubt it's FFS related,
but maybe I'm wrong.

Are you having trouble doing normal things like start-up daemons in
additional jails?  Maybe you can gleam some more info by stressing the
machine further and watching for errors.

> httpd: 7416
> master: 6618
> syslogd: 1117
> qmgr: 780
> pickup: 779
> smtpd: 609
> sshd: 503
> cron: 495
> perl: 279
> trivial-rewrite: 274
> 
> but, again, those are known/open files ... fstat | wc -l only accounts for 
> ~20k or so of that list :(

Yup, userspace tools alone probably won't be able to provide much
useful info.

-- 
 Allan Fields, AFRSL - http://afields.ca
 2D4F 6806 D307 0889 6125  C31D F745 0D72 39B4 5541