NFS client/buffer cache deadlock

Brian Fundakowski Feldman green at freebsd.org
Tue Apr 26 09:25:51 PDT 2005


On Tue, Apr 26, 2005 at 06:06:09PM +0200, Marc Olzheim wrote:
> On Tue, Apr 26, 2005 at 11:50:43AM -0400, Brian Fundakowski Feldman wrote:
> > > I'm okay with the fact that simultaneous huge writes to the same file
> > > over NFS could lead to corruption and that the exact outcome is
> > > undefined.
> > > 
> > > This is exactly how it was in FreeBSD 4.x and that's perfectly workable.
> > > 
> > > But that's just my way of looking at it and certainly not ideal. :-/
> > 
> > I don't know what you mean.  The exact same bug should exists in 4.x,
> > and should cause a system deadlock in exactly the same scenario.
> 
> I'm not sure you understand the "scenario". All I do is create a new
> file and writev 600 * 1MB to it. This creates a VFS hangup on FreeBSD
> 5.x after writing an amount of 2-100 MB (depending on how much memory is
> in the system), while 4.x just does what it is told and doesn't hangup.
> 
> I do not have any synchronisation problems.
> 
> See kern/79208

Then it sounds like for whatever reason FreeBSD 4.x isn't negotiating
NFSv3 properly and should be fixed.  This is fundamentally a deadlock
situation.  The write is a transaction and requires any part of the
write request may be retransmitted.  This can only be accomplished by
retaining the entire write contents for the duration of the operation.
You can assure that this happens in only two ways:

1. Make a complete copy of the data.  This is what currently occurs:
   it gets stuffed into the buffer cache as the write happens.
2. Keep the data around synchronously -- by virtue of the write system
   call being used synchronously, the thread's VM context is around,
   and duplication need not occur.

I'm trying to fix all situations, not just yours.  I think I've
changed my mind about short writes being acceptable simply because
short writes will cause detectable corruption, but once detected, you
have no knowledge of the exact location of the corruption.  So it's
really only a choice between forcing synchronous operation implicitly,
or explicitly by returning an error that says no data at all was
written.

-- 
Brian Fundakowski Feldman                           \'[ FreeBSD ]''''''''''\
  <> green at FreeBSD.org                               \  The Power to Serve! \
 Opinions expressed are my own.                       \,,,,,,,,,,,,,,,,,,,,,,\


More information about the freebsd-standards mailing list