Fixes to allow write clustering of NFS writes from a FreeBSD NFS client

Rick Macklem rmacklem at uoguelph.ca
Fri Aug 26 17:43:40 UTC 2011


Correcting myself yet again:
> I eventually tracked this down to the code in the NFS client that
> pushes out a
> previous dirty region via 'bwrite()' when a write would dirty a
> non-contiguous
> region in the buffer:
> 
> if (bp->b_dirtyend > 0 &&
> (on > bp->b_dirtyend || (on + n) < bp->b_dirtyoff)) {
> if (bwrite(bp) == EINTR) {
> error = EINTR;
> break;
> }
> goto again;
> }
> 
  Btw, the code was correct to use FILESYNC for this case.
  Why? Well, if the b_dirtyoff, b_dirtyend are used by the "bottom half"
  for the write/commit RPCs, the client won't know to re-write/commit
  the range specified by b_dirtyoff/b_dirtyend after the range changes.
  (ie. If the server crashes/reboots between the UNSTABLE write and the
   commit, the change will get lost.)

  However, if you calculate the off, len arguments for the Commit RPC
  to cover the entire block and not just b_dirtyoff->b_dirtyend, then
  doing the write UNSTABLE should be fine. (Having the range larger than
  the what was written should be ok. In fact the FreeBSD server ignore
  the arguments and commits the entire file via VOP_FSYNC().)

I realize I was wrong w.r.t this.
If the server crashes and reboots between the write RPCs and the Commit RPC,
the client will only know the last byte range to re-write.
For this to work correctly for UNSTABLE writes, a list of dirty byte ranges
must be maintained and the client must do write RPCs for all of them (and do
them again, if the server crashes before the commit).
Btw, there is code in the NFSv4 stuff that handles a list of byte ranges.
It does so for the byte range locking, but you could just rename
struct nfscllock something without `lock` in it and then reuse
nfscl_updatelock() to handle the list. (It might need a few tweaks for
the non-lock case, but shouldn`t need much.)

Hopefully I have finally got this correct and have not totally confused
everyone, rick




More information about the freebsd-fs mailing list