RFC: NFS client patch to reduce sychronous writes

Thu Nov 28 07:24:45 UTC 2013

On Wed, Nov 27, 2013 at 07:19:47PM -0500, Rick Macklem wrote:
> Kirk wrote:
> > > Date: Wed, 27 Nov 2013 17:50:48 -0500 (EST)
> > > From: Rick Macklem <rmacklem at uoguelph.ca>
> > > To: Konstantin Belousov <kostikbel at gmail.com>
> > > Subject: Re: RFC: NFS client patch to reduce sychronous writes
> > > 
> > > Kostik wrote:
> > >> Sorry, I do not understand the question. mmap(2) itself does not
> > >> change
> > >> file size.  But if mmaped area includes the last page, I still
> > >> think
> > >> that the situation I described before is possible.
> > > 
> > > Yes, I'll need to look at this. If it is a problem, all I can think
> > > of
> > > is bzeroing all new pages when they're allocated to the buffer
> > > cache.
> > > 
> > > Thanks for looking at it, rick
> > > ps: Btw, jhb@'s patch didn't have the bzeroing in it.
> > 
> > The ``fix'' of bzero'ing every buffer cache page was made to UFS/FFS
> > for this problem and it killed write performance of the filesystem
> > by nearly half. We corrected this by only doing the bzero when the
> > file is mmap'ed which helped things considerably (since most files
> > being written are not also bmap'ed).
> > 
> > 	Kirk
> > 
> Ok, thanks. I've been trying to reproduce the problem over NFS and
> haven't been able to break my patch. I was using the attached trivial
> test program and would simply make a copy of the source file (529 bytes)
> to test on. I got the same results both locally and over NFS:
> - built without -DWRITEIT, the setting of a value after EOF would be
>   lost, because nothing grew the file from 529 bytes to over 4080bytes.
> - built with -DWRITEIT, both the 'A' and 'B' are in the result, since
>   my patch bzeros the grown segment in the write(2) syscall.
> 
> - If I move the write (code in #ifdef WRITEIT) to after the "*cp"
>   of the mapped page, the 'A' assigned to "*cp" gets lost for
>   both UFS and NFS.
>   Is this correct behaviour?
> 
> If it is correct behaviour, I can't see how the patch is broken, but
> if you think it might still be, I'll look at doing what Kirk suggests,
> which is bzeroing all new buffer cache pages when the file is mmap()d.
> 
Replying there, since text description is more informative than the code.

You cannot get the situation I described, with single process.
You should have a writer in one thread, and reader through the mmaped
area in another.  Even than, the race window is thin.

Let me describe the issue which could exist one more time:

Thread A (writer) issued write(2).  The kernel does two things:
1. zeroes part of the last buffer of the affected file.
2. kernel uiomove()s the write data into the buffer b_data.

Now, assume that thread B has the same file mmaped somewhere, and
accesses the page of the buffer after the [1] but before [2].  Than,
it would see zeroes instead of the valid content.

I said that this breaks write/mmap consistency, since thread B can
see a content in the file which was never written there.  The condition
is transient, it self-repairs after thread A passes point 2.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20131128/4d0f6653/attachment.sig>