NFS client/buffer cache deadlock

Fri Apr 15 08:06:24 PDT 2005

On Fri, Apr 15, 2005 at 03:21:08PM +0200, Marc Olzheim wrote:
> On Fri, Apr 15, 2005 at 01:08:21AM -0400, Brian Fundakowski Feldman wrote:
> > I'll spare a lengthy write-up because I think the patch documents it well
> > enough.  It certainly appears to fix things here when doing very large
> > block-sized writes, but it also reduces the throughput with those block
> > sizes.  (I don't think there should be any difference when using reasonable
> > block sizes).
> 
> Is this supposed to fix kern/79208 ?

Yes, it does; would you like to try a more recent version of the patch?
It's actually against -STABLE, but it needs to be tested in -CURRENT if
it's going ot try to make it into 5.x (or hopefully 5.4-RELEASE).

See: <http://green.homeunix.org/~green/nfs_client.deadlock.patch>

This also implements non-blocking writes for NFS clients and will
do the right thing (perform a continuous write, flushing as it goes,
and causing no drop in performance) for "non-atomic" I/O, but I
don't know of any interface that allows you to do non-atomic writes.

Buggy applications can break because of this.  The behavior is almost
never going to be different until you start trying to use extremely
large (say, over a megabyte) writes: if you actually depend on writes
being complete and atomic, you check that what you intended to write
(a reasonable amount) was exactly what was written in a single system
call.  If you don't, then you're supposed to correctly handle short
writes by completing them yourself.

If your application expects to have multiple interleaved appenders do
the right thing for these giant writes, I don't expect it will work.
The implmentation, but not the manpage, would continue to match the
POSIX semantics (with regard to short writes).

See: <http://www.opengroup.org/onlinepubs/009695399/functions/write.html>

If it expects NFS-level append atomicity of large writes, it will not
get that.  If it expects local-machine-level append atomicity of large
writes, it could get that if we provide an interface for !IO_UNIT.
Note that file locking is also an option...  I don't believe there is
any way to provide unlimited-sized, retryable (in the NFS atomic
transaction level), NFS client writes.

-- 
Brian Fundakowski Feldman                           \'[ FreeBSD ]''''''''''\
  <> green at FreeBSD.org                               \  The Power to Serve! \
 Opinions expressed are my own.                       \,,,,,,,,,,,,,,,,,,,,,,\