leaking lots of unreferenced inodes (pg_xlog files?)

Sun Jun 2 20:36:16 UTC 2013

--On 31 maj 2013 11.25.40 -0700 Kirk McKusick <mckusick at mckusick.com> wrote:

>> Date: Thu, 30 May 2013 12:56:54 +0200
>> From: Palle Girgensohn <girgen at FreeBSD.org>
>> To: Kirk McKusick <mckusick at mckusick.com>
>> CC: freebsd-fs at FreeBSD.org, Jeff Roberson <jroberson at jroberson.net>,
>>         Dan Thomas <godders at gmail.com>, Julian Akehurst
>>         <julian at pingpong.se> Subject: Re: leaking lots of unreferenced
>> inodes (pg_xlog files?)
>>
>> Hello again!
>>
>> I have now remounted the postgresql filesystem on a test server that
>> experiences the same problem. The production server is not remounted
>> yet, we're planning that in a weeks time approximately, but I though I
>> could gain som time by running the suggested procedure on the test box.
>>
>> The base problem was this:
>>
>> # df -h /pgsql ; du -hxs /pgsql
>> Filesystem     Size    Used   Avail Capacity  Mounted on
>> /dev/da2s1d    128G    101G     17G    86%    /pgsql
>>  82G	/pgsql
>>
>> df says 101 GB used, but du only finds 82 GB, and fstat cannot find any
>> open files that are unreferenced in the file system. Stopping postgresql
>> does not help. It seems the OS is leaking inode references.
>>
>> FreeBSD 9.1, postgresql 9.2.3 from port.
>>
>> I ran the suggested commans (in attached diskspacecheck) before stopping
>> postgresql (before.log), after stopping postgresql but before unmount
>> /pgsql (before2.log), and then i unmounted /pgsql (had to run umount -f
>> /pgsql, and it took about 20 seconds). I did not enter single-user mode,
>> since I really did not have to this time (On the production server, the
>> disk is /usr, so that will require more shutting down...)
>>
>> I've attach the logs here. Hope it helps!
>>
>> The commands run in diskspaccheck are
>> # ! /bin/sh
>> df -ih /pgsql
>> vmstat -z
>> vmstat -m
>> sysctl debug
>> fstat -f /pgsql
>>
>> as suggested by Kirk.
>
> Your results are very enlightening. Especially the fact that you have
> to do a forcible unmount of the filesystem. What that tells me is that
> somehow we are getting vnodes that have phantom references. That is
> there is some system call where we get a reference on a vnode (vref,
> vget, or similar) that does not ultimately have a corresponding drop
> of the reference (vrele, vput, or similar). The net effect is that
> the file is held open despite the fact that there are no longer any
> connections to it. When you do the forcible unmount, the kernel walks
> the list of vnodes associated with the filesystem and does a vgone on
> each of them. That causes each to be inactivated which then triggers
> the release of their associated disk space. The reason that the unmount
> takes 20 seconds is to process all the releasing of the space. My guess
> is that there is an error path in some system call that is missing the
> vrele or vput.
>
> Assuming that you are able to run some more tests on your test machine,
> the next step in narrowing down the set of code to look at is to try
> running your system with soft updates disabled. The idea is to find out
> whether the miss-matched references are in the soft updates code or are
> in one of the filesystem system calls themselves. To disable soft updates
> run the command `tunefs -n disable /pgsql' on the unmounted /pgsql
> filesystem. If the system then runs without the problem, I will know
> to search the soft updates code. If the problem persists, then I'll
> know to look in the system calls themselves. You may want to do some
> preliminary tests to see how quickly the problem manifests itself.
> You can do this by running it for a short time (10 minutes say) and
> then checking to see if you need to do a forcible unmount of the
> filesystem. Once you establish how long you have to run before you
> reliably have to do a forcible unmount, you will know how long to
> run the test with soft updates turned off. If you find that running
> with soft updates turned off makes your application run too slowly
> you can mount your filesystem asynchronously. Note however, that you
> should not run asynchronously if the data on the filesystem is critical
> as you may end up with an unrecoverable filesystem after a power failure
> or system crash. So only run asynchronously if you can afford to lose
> your filesystem.
>
> Finally, it would be helpful if you could add two more commands to
> your diskspacecheck.sh script:
>
> 	sysctl -a | egrep vnode
> 	mount -v
>
> The first shows the vnode usage and the second shows the operational
> state of your filesystems.
>
> 	Kirk McKusick

OK, I have now turned off soft updates. This is on the test server. It is 
not as busy as the production machine, but I'll keep an eye on it and will 
mail new results as soon as I see any evidence of either that soft updates 
is the culprit or that it is not.

FWIW, I attach the script from this remount process as well, which includes 
sysctl -a | grep vnode ; mount -v. Note that it is all in one script file 
this time.

Cheers,
Palle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: second_test
Type: application/octet-stream
Size: 119120 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20130602/9711f950/attachment-0001.obj>