leaking lots of unreferenced inodes (pg_xlog files?)

Sun Jun 2 21:01:13 UTC 2013

> Date: Sun, 02 Jun 2013 22:35:23 +0200
> From: Palle Girgensohn <girgen at freebsd.org>
> To: Kirk McKusick <mckusick at mckusick.com>
> Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?) 
> Cc: freebsd-fs at freebsd.org, Dan Thomas <godders at gmail.com>,
>         Jeff Roberson <jroberson at jroberson.net>,
>         Julian Akehurst <julian at pingpong.se>
> 
> --On 31 maj 2013 11.25.40 -0700 Kirk McKusick <mckusick at mckusick.com> wrote:
> 
>> Your results are very enlightening. Especially the fact that you have
>> to do a forcible unmount of the filesystem. What that tells me is that
>> somehow we are getting vnodes that have phantom references. That is
>> there is some system call where we get a reference on a vnode (vref,
>> vget, or similar) that does not ultimately have a corresponding drop
>> of the reference (vrele, vput, or similar). The net effect is that
>> the file is held open despite the fact that there are no longer any
>> connections to it. When you do the forcible unmount, the kernel walks
>> the list of vnodes associated with the filesystem and does a vgone on
>> each of them. That causes each to be inactivated which then triggers
>> the release of their associated disk space. The reason that the unmount
>> takes 20 seconds is to process all the releasing of the space. My guess
>> is that there is an error path in some system call that is missing the
>> vrele or vput.
>>
>> Assuming that you are able to run some more tests on your test machine,
>> the next step in narrowing down the set of code to look at is to try
>> running your system with soft updates disabled. The idea is to find out
>> whether the miss-matched references are in the soft updates code or are
>> in one of the filesystem system calls themselves. To disable soft updates
>> run the command `tunefs -n disable /pgsql' on the unmounted /pgsql
>> filesystem. If the system then runs without the problem, I will know
>> to search the soft updates code. If the problem persists, then I'll
>> know to look in the system calls themselves. You may want to do some
>> preliminary tests to see how quickly the problem manifests itself.
>> You can do this by running it for a short time (10 minutes say) and
>> then checking to see if you need to do a forcible unmount of the
>> filesystem. Once you establish how long you have to run before you
>> reliably have to do a forcible unmount, you will know how long to
>> run the test with soft updates turned off. If you find that running
>> with soft updates turned off makes your application run too slowly
>> you can mount your filesystem asynchronously. Note however, that you
>> should not run asynchronously if the data on the filesystem is critical
>> as you may end up with an unrecoverable filesystem after a power failure
>> or system crash. So only run asynchronously if you can afford to lose
>> your filesystem.
>>
>> Finally, it would be helpful if you could add two more commands to
>> your diskspacecheck.sh script:
>>
>> 	sysctl -a | egrep vnode
>> 	mount -v
>>
>> The first shows the vnode usage and the second shows the operational
>> state of your filesystems.
>>
>> 	Kirk McKusick
> 
> OK, I have now turned off soft updates. This is on the test server. It is
> not as busy as the production machine, but I'll keep an eye on it and will
> mail new results as soon as I see any evidence of either that soft updates
> is the culprit or that it is not.
> 
> FWIW, I attach the script from this remount process as well, which includes
> 
> sysctl -a | grep vnode ; mount -v.
> 
> Note that it is all in one script file this time.
> 
> Cheers,
> Palle

This looks good. Keep me posted.

	Kirk McKusick