kern/144330: [nfs] mbuf leakage in nfsd with zfs
Rick Macklem
rmacklem at uoguelph.ca
Mon Mar 22 14:00:13 UTC 2010
The following reply was made to PR kern/144330; it has been noted by GNATS.
From: Rick Macklem <rmacklem at uoguelph.ca>
To: Daniel Braniss <danny at cs.huji.ac.il>
Cc: Mikolaj Golub <to.my.trociny at gmail.com>,
Jeremy Chadwick <freebsd at jdc.parodius.com>, freebsd-fs at FreeBSD.org,
Kai Kockro <kkockro at web.de>, bug-followup at FreeBSD.org,
gerrit at pmp.uni-hannover.de
Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs
Date: Mon, 22 Mar 2010 10:04:46 -0400 (EDT)
On Mon, 22 Mar 2010, Daniel Braniss wrote:
>
> well, it's much better!, but no cookies yet :-)
>
Well, that's good news. I'll try and get dfr to review it and then
commit it. Thanks Mikolaj, for finding this.
> from comparing graphs in
> ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/
> store-01-e.ps: a production server running newfsd - now up almost 20 days
> notice that the average used mbuf is below 1000!
>
> store-02.ps: kernel without last patch, classic nfsd
> the leak is huge.
>
> store-02++.ps: with latest patch
> the leak is much smaller but I see 2 issues:
> - the initial leap to over 2000, then a smaller leak.
The initial leap doesn't worry me. That's just a design constraint.
A slow leak after that is still a problem. (I might have seen the
slow leak in testing here. I'll poke at it and see if I can reproduce
that.)
>
> could someone explain replay_prune() to me?
>
I just looked at it and I think it does the following:
- when it thinks the cache is too big (either too many entries
or too much mbuf data) it loops around until:
- no longer too much or can't free any more
(when an entry is free'd, rc_size and rc_count are
reduced)
(the loop is from the end of the tailq, so it is freeing
the least recently used entries)
- the test for rce_repmsg.rm_xid != 0 avoids freeing ones
that are in progress, since rce_repmsg is all zeroed until
the reply has been generated
I did notice that the call to replay_prune() from replay_setsize() does
not lock the mutex before calling it, so it doesn't look smp safe to me
for this case, but I doubt that would cause a slow leak. (I think this is
only called when the number of mbuf clusters in the kernel changes and
might cause a kernel crash if the tailq wasn't in a consistent state as
it rattled through the list in the loop.)
rick
More information about the freebsd-fs
mailing list