amd64/161493: NFS v3 directory structure update slow

Rick Macklem rmacklem at uoguelph.ca
Thu Oct 13 01:00:29 UTC 2011


The following reply was made to PR kern/161493; it has been noted by GNATS.

From: Rick Macklem <rmacklem at uoguelph.ca>
To: John Baldwin <jhb at freebsd.org>
Cc: George Breahna <george at polarismail.com>, freebsd-gnats-submit at freebsd.org, 
	Rick Macklem <rmacklem at freebsd.org>, freebsd-amd64 at freebsd.org
Subject: Re: amd64/161493: NFS v3 directory structure update slow
Date: Wed, 12 Oct 2011 20:25:11 -0400 (EDT)

 John Baldwin wrote:
 > On Tuesday, October 11, 2011 11:07:13 am George Breahna wrote:
 > >
 > > >Number: 161493
 > > >Category: amd64
 > > >Synopsis: NFS v3 directory structure update slow
 > > >Confidential: no
 > > >Severity: critical
 > > >Priority: high
 > > >Responsible: freebsd-amd64
 > > >State: open
 > > >Quarter:
 > > >Keywords:
 > > >Date-Required:
 > > >Class: sw-bug
 > > >Submitter-Id: current-users
 > > >Arrival-Date: Tue Oct 11 15:10:07 UTC 2011
 > > >Closed-Date:
 > > >Last-Modified:
 > > >Originator: George Breahna
 > > >Release: 9.0 Beta 2
 > > >Organization:
 > > >Environment:
 > > FreeBSD store2 9.0-BETA2 FreeBSD 9.0-BETA2 #0: Sun Sep 18 22:02:45
 > > EDT 2011
 > pulsar at store2.emailarray.com:/usr/obj/usr/src/sys/PULSAR amd64
 > > >Description:
 > > We used to run a NFS server on FreeBSD 6.2 but we built a new box
 > > recently
 > and installed 9.0 Beta 2 on it. The data was moved over as it serves
 > as the
 > back-end for a mail system. It runs NFS v3 over TCP only and all the
 > NFS-
 > related processes (rpcbind, mountd, lockd, etc ) run with the -h
 > switch and
 > bind to the local IP address.
 > >
 > > The NFS server exports the data to 7 NFS clients ranging from
 > > FreeBSD 6.1 to
 > 8.2, the majority being 8.2 The mount on the NFS clients is done
 > simply with -
 > o tcp,rsize=32768,wsize=32768
 > >
 > > Usual file operations, such as accessing files, creating
 > > directories,
 > removing files, chmod, chown, etc work perfectly but we noticed there
 > were
 > issues in removing directories that contained data. We had a strange
 > error:
 > >
 > > rm -rf nick/
 > > rm: fts_read: Input/output error
 > >
 > > Using 'truss' on rm revealed this:
 > >
 > > open("..",O_RDONLY,00) ERR#5 'Input/output error'
 > >
 > > After much testing and debugging we realized the problem is in the
 > > NFS
 > protocol. ( either server or client but we assume server since this
 > used to
 > work very well with FreeBSD 6.2 ). The problem appears to be that NFS
 > does not
 > show the '..' after modifying a directory structure. Take the
 > following
 > example executed on a FreeBSD 8.2 client accessing the NFS share from
 > the
 > 9.0B2 server:
 > >
 > > imap5# mkdir test1
 > > imap5# cd test1
 > > imap5# touch file1
 > > imap5# touch file2
 > > imap5# ls -la
 > > ls: ..: Input/output error
 > > total 4
 > > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:55 .
 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file1
 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file2
 > >
 > > Notice the '..' is missing from the display. If we now try and
 > > remove the
 > directory 'test1' it will throw the "rm: fts_read: Input/output error"
 > error.
 > >
 > > If we wait in between 1 minute and 5 minutes, '..' will eventually
 > > appear by
 > itself. During this whole time, '..' effectively exists on the NFS
 > server but
 > it's not displayed by any of the NFS clients.
 > >
 > > I can force the NFS client to show it faster by doing an ls -la from
 > > the
 > parent level. For example:
 > >
 > > imap5# mkdir test1
 > > imap5# touch test1/file1
 > > imap5# touch test1/file2
 > > imap5# touch test1/file3
 > > imap5# ls -la test1
 > > total 8
 > > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:59 .
 > > drwx------ 10 vpopmail vchkpw 1024 Oct 11 10:59 ..
 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file1
 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file2
 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file3
 > > imap5# cd test1
 > > imap5# ls -la
 > > total 8
 > > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:59 .
 > > drwx------ 10 vpopmail vchkpw 1024 Oct 11 10:59 ..
 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file1
 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file2
 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file3
 > >
 > > but if we wait 5 seconds after that display and try again:
 > >
 > > ls -la
 > > ls: ..: Input/output error
 > > total 4
 > > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:59 .
 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file1
 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file2
 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file3
 > >
 > > Again, if we wait longer ( 1-5 minutes ), the '..' will properly
 > > appear in
 > there.
 > >
 > > There are no error messages on the console or other log files. This
 > > is
 > reproducible 100% of the time with any FreeBSD client. Have tried
 > unmounting/remounting several times without any effect. Also tried
 > different
 > rsize/wsize, no effect. I think there is some delay in updating the
 > directory
 > structure and it's causing this bug.
 > >
 > > Here's also some output from nfsstat on the server:
 > >
 > >
 > > Server Info:
 > >   Getattr Setattr Lookup Readlink Read Write Create
 > Remove
 > > 114731225 20496896 254966151 133 11697392 19963641 0
 > 9228861
 > >    Rename Link Symlink Mkdir Rmdir Readdir RdirPlus
 > Access
 > >   4313471 1157651 39 1955 16511932 15479669 0
 > 116927742
 > >     Mknod Fsstat Fsinfo PathConf Commit
 > >         0 4748487 48 0 14921747
 > > Server Ret-Failed
 > >                 0
 > > Server Faults
 > >             0
 > > Server Cache Stats:
 > >    Inprog Idem Non-idem Misses
 > >         0 0 0 613368147
 > > Server Write Gathering:
 > >  WriteOps WriteRPC Opsaved
 > >  19963641 19963641 0
 > >
 > > >How-To-Repeat:
 > > imap5# mkdir test1
 > > imap5# cd test1
 > > imap5# touch file1
 > > imap5# touch file2
 > > imap5# ls -la
 > > ls: ..: Input/output error
 > > total 4
 > > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:55 .
 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file1
 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file2
 > > >Fix:
 > 
 > Can you try using the "old" NFS server as a test?
 > 
 Please make sure you have the patch in r225356 in your server's
 kernel sources (it went into head on Sep. 3, but I don't know if
 your Sep. 11 build would have it?). It fixed a problem that would
 cause lookup of ".." to fail intermittently, because a field in
 struct nameidata added on Aug. 13 wasn't initialized.
 
 You can find the one line patch here:
    http://svnweb.freebsd.org/base/head/sys/fs/nfsserver/nfs_nfsdport.c?r1=224911&r2=225356
 
 Please let us know if you have this patch and, if not, apply it
 and see if the problem goes away.
 
 Thanks, rick
 


More information about the freebsd-fs mailing list