Re: git: 867c27c23a5c - main - nfscl: Change IO_APPEND writes to direct I/O

From: Rick Macklem <rmacklem_at_uoguelph.ca>
Date: Thu, 16 Dec 2021 14:58:23 UTC
Kostik wrote:
>On Wed, Dec 15, 2021 at 04:39:28PM +0000, Rick Macklem wrote:
>> The branch main has been updated by rmacklem:
>>
>> URL: https://cgit.FreeBSD.org/src/commit/?id=867c27c23a5c469b27611cf53cc2390b5a193fa5
>>
>> commit 867c27c23a5c469b27611cf53cc2390b5a193fa5
>> Author:     Rick Macklem <rmacklem@FreeBSD.org>
>> AuthorDate: 2021-12-15 16:35:48 +0000
>> Commit:     Rick Macklem <rmacklem@FreeBSD.org>
>> CommitDate: 2021-12-15 16:35:48 +0000
>>
>>     nfscl: Change IO_APPEND writes to direct I/O
>>
>>     IO_APPEND writes have always been very slow over NFS, due to
>>     the need to acquire an up to date file size after flushing
>>     all writes to the NFS server.
>>
>>     This patch switches the IO_APPEND writes to use direct I/O,
>>     bypassing the buffer cache.  As such, flushing of writes
>>     normally only occurs when the open(..O_APPEND..) is done.
>>     It does imply that all writes must be done synchronously
>>     and must be committed to stable storage on the file server
>>     (NFSWRITE_FILESYNC).
>>
>>     For a simple test program that does 10,000 IO_APPEND writes
>>     in a loop, performance improved significantly with this patch.
>>
>>     For a UFS exported file system, the test ran 12x faster.
>>     This drops to 3x faster when the open(2)/close(2) are done
>>     for each loop iteration.
>>     For a ZFS exported file system, the test ran 40% faster.
>>
>>     The much smaller improvement may have been because the ZFS
>>     file system I tested against does not have a ZIL log and
>>     does have "sync" enabled.
>>
>>     Note that IO_APPEND write performance is still much slower
>>     than when done on local file systems.
>>
>>     Although this is a simple patch, it does result in a
>>     significant semantics change, so I have given it a
>>     large MFC time.
>
>How is the buffer cache coherency is handled then?
>Imagine that other process either reads from this file, or even have it
>mapped.  What does ensure that reads and page cache see the data written
>by direct path?

Well, for the buffer cache case, there is code near the beginning of
ncl_write() (the NFS VOP_WRITE()) that calls ncl_vinvalbuf() for the
IO_APPEND case. As such, any data in the buffer cache gets invalidated
whenever an Append write occurs.

But, now that I look at it, it does not do anything w.r.t. mmap'd files.
(The direct I/O stuff has been there for a long time, but it isn't enabled
 by default, so it probably doesn't get tested much. Also, it has a sysctl
 that allows mmap for direct I/O, which is enabled by default. It causes
 getpage/putpage to fail if it is not enabled.)

So, it looks like code to invalidate pages needs to be done along with
the ncl_vinvalbuf()?
--> I'll come up with a patch and then get you to review it.

Thanks for pointing this out, rick