[RFC] Should vfs.nfsrv.async be implemented for new NFS server?

Mon Nov 7 00:47:17 UTC 2011

Josh Paetzel wrote:
> On 11/06/11 07:34, Rick Macklem wrote:
> > Ronald Klop wrote:
> >> On Sun, 06 Nov 2011 02:18:05 +0100, Rick Macklem
> >> <rmacklem at uoguelph.ca>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> Josh Paetzel pointed out that vfs.nfsrv.async doesn't exist
> >>> for the new NFS server.
> >>>
> >>> I don't think I had spotted this before, but when I looked I
> >>> saw that, when vfs.nfsrv.async is set non-zero in the old server,
> >>> it returns FILESYNC (which means the write has been committed to
> >>> non-volatile storage) even when it hasn't actually done that.
> >>>
> >>> This can improve performance, but has some negative implications:
> >>> - If the server crashes before the write is committed to
> >>>   non-volatile storage, the file modification will be lost.
> >>>   (When a server replies UNSTABLE to a write, the client holds
> >>>    onto the data in its cache and does the write again if the
> >>>    server crashes/reboots before the client does a Commit RPC
> >>>    for the file. However, a reply of FILESYNC tells the client
> >>>    it can forget about the write, because it is done.)
> >>> - Because of the above, replying FILESYNC when the data is not
> >>>   yet committed to non-volatile (also referred to as stable)
> >>>   storage, this is a violation of RFC1813.
> >>
> >> Just out of curiosity. Why would lying about FILESYNC improve
> >> performance
> >> over UNSTABLE? The server does the same work. Only the client holds
> >> data
> >> longer in memory. I only see impact if the client has just a little
> >> bit of
> >> memory.
> >>
> >> Ronald.
> > Well, I'm not sure I have an answer. Josh noted that it makes a big
> > difference for them. Maybe he can elaborate?
> >
> 
> I'll test it out and report back in the next week or so.
> 
> In 8.x, setting the async sysctl was the difference between
> 80-100MB/sec
> and 800 MB/sec (Yes, MegaBytes!) using a variety of different clients,
> including the VMWare ESXi 4.x client, Xen 5.6 client, various linux
> clients and the FreeBSD client. I'll note that 800MB/sec is getting
> close to the underlying filesystem performance, so it's likely that
> the
> gate to performance is in the filesystem in that case. 80-100MB/sec is
> basically gigE performance.
> 
Just wondering...are these tests writing a file larger than the buffer
cache can hold?

rick
> I can make hardware available if anyone is curious at poking at this,
> we
> have the ability to set up tests with quad gigE LACP, 10 gigE, and
> numerous clients.
> 
> > One additional effect is that the client in head must do a
> > synchronous
> > write (with FILESYNC and waiting for the RPC reply) before it can
> > modify a non-continuous region of the same buffer with respect to
> > the old dirty byte region. (This happens
> > frequently during builds, done mostly by the loader, I think?)
> > If the server replies FILESYNC, then the old dirty byte region is
> > done
> > (ie. no longer a dirty byte region) so the client doesn't
> > have to do the synchronous write described above.
> > I hope that the experimental patch I made available a few days ago,
> > along with work jhb@ is doing will eventually fix this for the
> > FreeBSD
> > client, but it won't be in head anytime soon (and who knows what
> > other clients do?).
> >
> > rick
> >
> 
> 
> --
> Thanks,
> 
> Josh Paetzel