vnode leak in FFS code ... ?
Marc G. Fournier
scrappy at hub.org
Fri Sep 3 20:49:07 PDT 2004
Just as a followup to this ... the server crashed on Thursday night around
22:00ADT, only just came back up after a very long fsck ... with all 62
VMs started up, and 1008 processes running, vnodes currently look like:
Sep 4 00:43:00 venus root: debug.numvnodes: 58370 - debug.freevnodes: 824 - debug.vnlru_nowhere: 0 - vlruwt
Sep 4 00:44:00 venus root: debug.numvnodes: 58370 - debug.freevnodes: 782 - debug.vnlru_nowhere: 0 - vlruwt
Sep 4 00:45:01 venus root: debug.numvnodes: 58370 - debug.freevnodes: 977 - debug.vnlru_nowhere: 0 - vlruwt
Sep 4 00:46:00 venus root: debug.numvnodes: 58370 - debug.freevnodes: 582 - debug.vnlru_nowhere: 0 - vlruwt
venus# ps aux | wc -l
about a tenth of what they were when the server crashed :(
On Wed, 1 Sep 2004, Marc G. Fournier wrote:
> On Wed, 1 Sep 2004, Allan Fields wrote:
>>> It's really hard to tell if there is a vnode leak here. The vnode pool
>>> is fairly fluid and has nothing to do with the number of files that are
>>> actually 'open'. Vnodes get created when the VFS layer wants to access
>>> an object that isn't already in the cache, and only get destroyed when
>>> the object is destroyed. A vnode that reprents a file that was opened
>>> will stay 'active' in the system long after the file has been closed,
>>> because it's cheaper to keep it active in the cache than it is to
>>> discard it and then risk having to go through the pain of a namei()
>>> and VOP_LOOKUP() again later. Only if the maxvnode limit is hit will
>>> old vnodes start getting recycled to represent other objects. [...]
>>> So you've obviously bumped up kern.maxvnodes well above the limits that
>>> are normally generated from the auto-tuner. Why did you do that, if not
>>> because you knew that you'd have a large working set of referenced (but
>>> maybe not open all at once) filesystem objects? [...]
>> There was a pevious thread I've found which also helps explains
>> this further:
>> Really the same issue now as then?
> I'm not getting the hangs now, it is freeing up vnodes ... but its having to
> work very hard to do so, or so it seems:
> venus# ps aux | grep vnlru
> root 7 3.0 0.0 0 0 ?? DL 5Aug04 606:34.54 (vnlru)
> I started up the script for monitoring this on Aug 29th ... since then, there
> have been 4331 entries to the log file, of which 1927 are in 'vlrup', which
> I believe is vnlru running through its lists trying to find some to free up,
> if I recall the code ... ?
> venus# grep vnode /var/log/syswatch | wc -l
> venus# grep vnode /var/log/syswatch | grep vlrup | wc -l
> and this is based on a check every minute ...
> The other server, running ~19 more VMs (~100 more processes), only up 2 days
> now, seems to be fairing better:
> debug.numvnodes: 344062 - debug.freevnodes: 168285 - debug.vnlru_nowhere: 0 -
> I've schedualed 'maintenance' on that server for Saturday ... am going to
> shut down all 'non-host server' processes, and unmount the large file system
> (where all the VMs run off of) ... see if that cleans up any of the vnodes
> without having to do a reboot ...
> If that doesn't work, I could cause a panic and have it dump core, if that
> would provide for easier/better debugging ... ?
> I have limited flexibility with the server, but it is a 'real' server without
> a fake load on it, and as solid as I've always considered FreeBSD to be, I
> seem to have a knack for pushing it and breaking it :( ... so whatever data I
> can provide to make it that much more solid, even if it involves a little bit
> of downtime to get a good core dump, I'm willing to do ...
> Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
> Email: scrappy at hub.org Yahoo!: yscrappy ICQ: 7615664
> freebsd-current at freebsd.org mailing list
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy at hub.org Yahoo!: yscrappy ICQ: 7615664
More information about the freebsd-current