Not panic in nfsd (Re: panic in nfsd on 6.2-RC1)

Kostik Belousov kostikbel at gmail.com
Fri Dec 15 13:22:52 PST 2006


On Fri, Dec 15, 2006 at 02:29:58PM -0500, Kris Kennaway wrote:
> On Fri, Dec 15, 2006 at 02:12:16PM -0500, Sven Willenberger wrote:
> > On Fri, 2006-12-15 at 13:15 -0500, Kris Kennaway wrote:
> > > On Fri, Dec 15, 2006 at 10:01:19AM -0500, Sven Willenberger wrote:
> > > > On Tue, 2006-12-05 at 12:38 +0900, Hiroki Sato wrote:
> > > > > Kostik Belousov <kostikbel at gmail.com> wrote
> > > > >   in <20061204160949.GM35681 at deviant.kiev.zoral.com.ua>:
> > > > > 
> > > > > ko> What version of sys/nfsserver/nfs_serv.c do you use ? If it is older than
> > > > > ko> 1.156.2.7, please, update the system.
> > > > > 
> > > > >  Thanks, I updated it just now and see how it works.
> > > > > 
> > > > > --
> > > > > | Hiroki SATO
> > > > 
> > > > I was/am having the same issue. Updating world (6.2-stable) to include
> > > > the above update sadly did not fix the problem for me. This is an amd64
> > > > box with only one client connecting to it via nfs. Reading further it
> > > > may seem to be an issue with rpc.statd and/or rpc.lockd. As I only have
> > > > one client connecting and it is being used as mail storage (i.e. the
> > > > client pops/imaps the storage) would be safe to not using fcntl forwards
> > > > over the wire? Is this same issue present in 6.1-RELENG? I am really at
> > > > my wits end at this point and for the first time am actually considering
> > > > moving to another OS (solaris more than likely) as I cannot have these
> > > > types of issues interrupting services every couple days.
> > > > 
> > > > What other information (spefically) can I provide to help the devs
> > > > figure out what is going on? What can I do in the meantime to have some
> > > > semblence of stability? I assume downgrading to 5.5-RELENG is out of the
> > > > question but perhaps disabling SMP?
> > > 
> > > Just to confirm, can you please post the panic backtrace you are
> > > seeing?  And can you explain what you mean by "may seem to be an issue
> > > with rpc.statd and/or rpc.lockd"?
> > > 
> > > Sometimes people think they're seeing the same problem as someone else
> > > when really it's a completely different problem in the same subsystem,
> > > so I'd like to rule that out here.
> > > 
> > > Kris
> > 
> > Well I have now added kdb and invariants/witness support to the kernel
> > so I should be able to get some backtrace the next time it happens.
> > Currently, the system just locks and no error is displayed on the
> > console or /var/log/messages; sorry I cannot be of immediate help there.
> 
> OK, so your issue is not, in fact, a "panic in nfsd" as you were
> claiming ;-)
> 
> > Regarding the rpc issue, I just ran across mention of those in sshfs/nfs
> > threads appearing here and in particular to a link referenced within one
> > of them (http://docs.freebsd.org/cgi/getmsg.cgi?fetch=1362611+0
> > +archive/2006/freebsd-stable/20060702.freebsd-stable ) - it is more than
> > likely not at all related but I am grasping at straws here trying to
> > solve this.
> 
> Yes, I think you are grasping at straws.  At this point, you need to
> do some debugging to find out the source of your problem, and treat it
> as a new bug until you find conclusive evidence that it's the same as
> a previously reported bug.
> 
> Guessing without evidence that your problem is the same as someone
> else's problem, because e.g. both involve your system becoming
> unresponsive, is a very good way to confuse the issue and delay
> resolution.
>  
> > FWIW, I do see the following appearing in the /var/log/messages:
> > ufs_rename: fvp == tvp (can't happen) 
> > about once or twice a day, but cannot correlate those to lockup. Now
> > that I have enabled the options mentioned above in the kernel, I am
> > seeing some LOR issues:
> > 
> > kernel: lock order reversal:
> > kernel: 1st 0xffffff00c3bab200 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547
> > kernel: 2nd 0xffffff0005bb6078 struct mount mtx (struct mount mtx) @ /usr/src/sys/ufs/ufs/ufs_vnops.c:138
> 
> OK, this is interesting, so let's proceed from here.
> 
> Kris

Try this.

Index: ufs/ufs/ufs_vnops.c
===================================================================
RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
retrieving revision 1.283
diff -u -r1.283 ufs_vnops.c
--- ufs/ufs/ufs_vnops.c	6 Nov 2006 13:42:09 -0000	1.283
+++ ufs/ufs/ufs_vnops.c	15 Dec 2006 21:19:51 -0000
@@ -133,19 +133,15 @@
 {
 	struct inode *ip;
 	struct timespec ts;
-	int mnt_locked;
 
 	ip = VTOI(vp);
-	mnt_locked = 0;
-	if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) {
-		VI_LOCK(vp);
+	VI_LOCK(vp);
+	if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0)
 		goto out;
+	if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) {
+		VI_UNLOCK(vp);
+		return;
 	}
-	MNT_ILOCK(vp->v_mount);		/* For reading of mnt_kern_flags. */
-	mnt_locked = 1;
-	VI_LOCK(vp);
-	if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
-		goto out_unl;
 
 	if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp))
 		ip->i_flag |= IN_LAZYMOD;
@@ -172,10 +168,7 @@
 
  out:
 	ip->i_flag &= ~(IN_ACCESS | IN_CHANGE | IN_UPDATE);
- out_unl:
 	VI_UNLOCK(vp);
-	if (mnt_locked)
-		MNT_IUNLOCK(vp->v_mount);
 }
 
 /*
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20061215/bcf3505f/attachment.pgp


More information about the freebsd-stable mailing list