[RFC] Should vfs.nfsrv.async be implemented for new NFS server?

Josh Paetzel josh at tcbug.org
Sun Nov 6 16:41:22 UTC 2011


On 11/06/11 07:34, Rick Macklem wrote:
> Ronald Klop wrote:
>> On Sun, 06 Nov 2011 02:18:05 +0100, Rick Macklem
>> <rmacklem at uoguelph.ca>
>> wrote:
>>
>>> Hi,
>>>
>>> Josh Paetzel pointed out that vfs.nfsrv.async doesn't exist
>>> for the new NFS server.
>>>
>>> I don't think I had spotted this before, but when I looked I
>>> saw that, when vfs.nfsrv.async is set non-zero in the old server,
>>> it returns FILESYNC (which means the write has been committed to
>>> non-volatile storage) even when it hasn't actually done that.
>>>
>>> This can improve performance, but has some negative implications:
>>> - If the server crashes before the write is committed to
>>>   non-volatile storage, the file modification will be lost.
>>>   (When a server replies UNSTABLE to a write, the client holds
>>>    onto the data in its cache and does the write again if the
>>>    server crashes/reboots before the client does a Commit RPC
>>>    for the file. However, a reply of FILESYNC tells the client
>>>    it can forget about the write, because it is done.)
>>> - Because of the above, replying FILESYNC when the data is not
>>>   yet committed to non-volatile (also referred to as stable)
>>>   storage, this is a violation of RFC1813.
>>
>> Just out of curiosity. Why would lying about FILESYNC improve
>> performance
>> over UNSTABLE? The server does the same work. Only the client holds
>> data
>> longer in memory. I only see impact if the client has just a little
>> bit of
>> memory.
>>
>> Ronald.
> Well, I'm not sure I have an answer. Josh noted that it makes a big
> difference for them. Maybe he can elaborate?
> 

I'll test it out and report back in the next week or so.

In 8.x, setting the async sysctl was the difference between 80-100MB/sec
and 800 MB/sec (Yes, MegaBytes!) using a variety of different clients,
including the VMWare ESXi 4.x client, Xen 5.6 client, various linux
clients and the FreeBSD client.  I'll note that 800MB/sec is getting
close to the underlying filesystem performance, so it's likely that the
gate to performance is in the filesystem in that case.  80-100MB/sec is
basically gigE performance.

I can make hardware available if anyone is curious at poking at this, we
have the ability to set up tests with quad gigE LACP, 10 gigE, and
numerous clients.

> One additional effect is that the client in head must do a synchronous
> write (with FILESYNC and waiting for the RPC reply) before it can
> modify a non-continuous region of the same buffer with respect to
> the old dirty byte region. (This happens
> frequently during builds, done mostly by the loader, I think?)
> If the server replies FILESYNC, then the old dirty byte region is done
> (ie. no longer a dirty byte region) so the client doesn't
> have to do the synchronous write described above.
> I hope that the experimental patch I made available a few days ago,
> along with work jhb@ is doing will eventually fix this for the FreeBSD
> client, but it won't be in head anytime soon (and who knows what
> other clients do?).
> 
> rick
> 


-- 
Thanks,

Josh Paetzel


More information about the freebsd-fs mailing list