leaking lots of unreferenced inodes (pg_xlog files?)

Tue Jul 16 22:47:27 UTC 2013

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kirk McKusick skrev:
>> Date: Mon, 15 Jul 2013 10:51:10 +0100 Subject: Re: leaking lots of
>> unreferenced inodes (pg_xlog files?) From: Dan Thomas
>> <godders at gmail.com> To: Kirk McKusick <mckusick at mckusick.com> Cc:
>> Palle Girgensohn <girgen at freebsd.org>, freebsd-fs at freebsd.org, Jeff
>> Roberson <jroberson at jroberson.net>, Julian Akehurst
>> <julian at pingpong.se> X-ASK-Info: Message Queued (2013/07/15
>> 02:51:22) X-ASK-Info: Confirmed by User (2013/07/15 02:55:04)
>> 
>> On 11 June 2013 01:17, Kirk McKusick <mckusick at mckusick.com>
>> wrote:
>>> OK, good to have it narrowed down. I will look to devise some 
>>> additional diagnostics that hopefully will help tease out the 
>>> bug. I'll hopefully get back to you soon.
>> Hi,
>> 
>> Is there any news on this issue? We're still running several
>> servers that are exhibiting this problem (most recently, one that
>> seems to be leaking around 10gb/hour), and it's getting to the
>> point where we're looking at moving to a different OS until it's
>> resolved.
>> 
>> We have access to several production systems with this problem and
>> (at least from time to time) will have systems with a significant
>> leak on them that we can experiment with. Is there any way we can
>> assist with tracking this down? Any diagnostics or testing that
>> would be useful?
>> 
>> Thanks, Dan
> 
> Hi Dan (and Palle),
> 
> Sorry for the long delay with no help / news. I have gotten 
> side-tracked on several projects and have had little time to try and
> devise some tests that would help find the cause of the lost space.
> It almost certainly is a one-line fix (a missing vput or vrele
> probably in some error path), but finding where it goes is the hard
> part :-)
> 
> I have had little success in inserting code that tracks reference 
> counts (too many false positives). So, I am going to need some help 
> from you to narrow it down. My belief is that there is some set of 
> filesystem operations (system calls) that are leading to the
> problem. Notably, a file is being created, data put into it, then the
> file is deleted (either before or after being closed).  Somehow a
> reference to that file is persisting despite there being no valid
> reference to it. Hence the filesystem thinks it is still live and is
> not deleting it. When you do the forcible unmount, these files get 
> cleared and the space shows back up.
> 
> What I need to devise is a small test program doing the set of system
> calls that cause this to happen. The way that I would like to try and
> get it is to have you `ktrace -i' your application and then run your
> application just long enough to create at least one of these lost
> files. The goal is to minimize the amount of ktrace data through
> which we need to sift.
> 
> In preparation for doing this test you need to have a kernel compiled
> with `option DIAGNOSTIC' or if you prefer, just add `#define
> DIAGNOSTIC 1' to the top of sys/kern/vfs_subr.c. You will know you
> have at least one offending file when you try to unmount the affected
> filesystem and find it busy. Before doing the `umount -f', enable
> busy printing using `sysctl debug.busyprt=1'. Then capture the
> console output which will show the details of all the vnodes that had
> to be forcibly flushed. Hopefully we will then be able to correlate
> them back to the files (NAMI in the ktrace output) with which they
> were associated. We may need to augment the NAMI data with the inode
> number of the associated file to make the association with the
> busyprt output. Anyway, once we have that, we can look at all the
> system calls done on those files and create a small test program that
> exhibits the problem. Given a small test program, Jeff or I can track
> down the offending system call path and nail this pernicious bug once
> and for all.
> 
> Kirk McKusick

Hi,

I have run ktrace -i on pg_ctl (which forks off all the postgresql
processes) and I got two "busy" files that where "lost" after a few
hours. dmesg reveals this:

vflush: busy vnode
0xfffffe067cdde960: tag ufs, type VREG
    usecount 1, writecount 0, refcount 2 mountedhere 0
    flags (VI(0x200))
 VI_LOCKed    v_object 0xfffffe0335922000 ref 0 pages 0
    lock type ufs: EXCL by thread 0xfffffe01600eb8e0 (pid 56723)
	ino 11047146, on dev da2s1d
vflush: busy vnode
0xfffffe039f35bb40: tag ufs, type VREG
    usecount 1, writecount 0, refcount 3 mountedhere 0
    flags (VI(0x200))
 VI_LOCKed    v_object 0xfffffe03352701d0 ref 0 pages 0
    lock type ufs: EXCL by thread 0xfffffe01600eb8e0 (pid 56723)
	ino 11045961, on dev da2s1d

I had to umount -f, so they where "lost".

So, now I have 55 GB ktrace output... ;)  Is there anything I can do to
filter it, or shall I compress it and put it on a web server for you to
fetch as it is?

Palle

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJR5c16AAoJEIhV+7FrxBJDK0AH/RLG1QLdyQhwNC6USlqO2+2B
6HXmYwbmDCMIlUQZAaG4h0x6QPzWjXWYMa1KDdpk/BtRhfL7z8tFPdWjTzqBPuK1
aEEQjv/Cp5IgI6FqVbc2agW3GfUwomtjEL3lUk2zmKdPImEWte6ZkLzOFgQpqQao
QAxFnN0I8/g+ynQNQIavGOo0foze89wAuOaNvoy9z1wa7tFbjlH2lsVK1xGU6eNj
AQn4RJw+tMPMGkNMy6Xjy7B/WMXfxutz1f4O9B1KBwLRZ/cgKxhmppoZdF3N4JsK
GNiQvcRbYR9GhBiK+Er87UXKBcj2NS+QQsdSqIb5Ik1ahp78hjxq3raHuOLCTLw=
=8+W4
-----END PGP SIGNATURE-----