NFS calculation of max commit size

John Baldwin jhb at freebsd.org
Tue Aug 16 13:50:53 UTC 2011


On Monday, August 15, 2011 10:25:54 pm Jeremy Chadwick wrote:
> On Mon, Aug 15, 2011 at 06:58:14PM -0400, Rick Macklem wrote:
> > John Baldwin wrote:
> > > On Sunday, August 07, 2011 6:47:46 pm Rick Macklem wrote:
> > > > A recent PR (kern/159351) noted that the following
> > > > calculation results in a divide-by-zero when
> > > > desiredvnodes < 1000.
> > > >
> > > > 	nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000);
> > > >
> > > > Just fixing the divide-by-zero is easy enough, but I'm not
> > > > sure what this calculation is trying to do. Making it a fraction
> > > > of "hibufspace" makes sense (nm_wcommitsize is the maximum # of
> > > > bytes of uncommitted data in the NFS client's buffer cache blocks,
> > > > if I understand it correctly), but why divide it by
> > > >
> > > >                 (desiredvnodes / 1000) ??
> > > >
> > > > Maybe thinking that fewer vnodes means sharing it with fewer
> > > > other file systems or ???
> > > >
> > > > Anyhow, it seems to me that the formulae is bogus for small
> > > > values of desiredvnodes (for example desiredvnodes == 1500
> > > > implies nm_wcommitsize == hibufspace, which sounds too large
> > > > to me).
> > > >
> > > > I'm thinking that putting an upper limit of 10% of hibufspace
> > > > might make sense. ie. Change the above to:
> > > >
> > > > 	if (desiredvnodes >= 11000)
> > > > 		nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000);
> > > > 	else
> > > > 		nmp->nm_wcommitsize = hibufspace / 10;
> > > >
> > > > Anyone have comments or insight into this calculation?
> > > >
> > > > rick
> > > > ps: jhb, I hope you don't mind. I emailed you first and then
> > > >     thought others might have some ideas, too.
> > > 
> > > Oh no, this is fine. A broader discussion is probably warranted. I
> > > honestly
> > > don't know what the goal is. I do think it is an attempt to share with
> > > other
> > > file systems, but I'm not sure how desiredvnodes / 1000 is useful for
> > > that.
> > > It also seems that we can end up setting this woefully low as well.
> > > That is,
> > > I wonder if we need a minimum of 10% of hibufspace so that it can
> > > scale
> > > between 10% and 90% of hibufspace (but I'm not sure what you would use
> > > to
> > > pick the scaling factor sanely). To my mind what you really want to do
> > > is
> > > something like 'hibufspace / (number of active mounts)', but that will
> > > not
> > > really work correctly unless we recalculate the value on each mount
> > > and
> > > unmount operation.
> > > 
> > > --
> > > John Baldwin
> > Btw, this was done by r147280 6.5years ago, so the formula doesn't seem
> > to be causing a lot of grief. Also of some interest is the fact that
> > wcommitsize appears to have been setable on a per-mount-point-basis until
> > mount_nfs(8) was converted to nmount(2). { There is no nmount option to set it. }
> > 
> > Btw, when nm_wcommitsize is exceeded, writes become synchronous, so it affects
> > how much write behind happens. This, in turn, affects how bursty (is this a real
> > word? hopefully you get what I mean?) the write traffic to the server is.
> > 
> > What I'm not sure about is what happens when multiple mounts use up the entire
> > buffer cache with write behinds. I'll try a little experiment to see if I
> > can find that out. (If making it large isn't detrimental, then I tend to
> > agree that the above sets nm_wcommitsize very small.)
> > 
> > Since "desiredvnodes" will seldom be less than 1000, I'm not going to
> > rush to a solution.
> > 
> > Anyone who has insight into what this formula should be, please let us know.
> 
> The commit message tries to explain it, but it's more than just a
> one-line change.
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/nfsclient/nfs_vfsops.c#rev1.177
> 
> There's also an associated PR:
> 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=79208

The commit added the limit which is sensible, but it doesn't explain the logic
for how the limit is computed (that is, why it uses desiredvnodes / 1000).

-- 
John Baldwin


More information about the freebsd-fs mailing list