kern/130628: [nfs] NFS / rpc.lockd deadlock on 7.1-R

Burt Rosenberg burt at cs.miami.edu
Wed Oct 14 14:40:05 UTC 2009


The following reply was made to PR kern/130628; it has been noted by GNATS.

From: Burt Rosenberg <burt at cs.miami.edu>
To: bug-followup at freebsd.org, Joe Marcus Clarke <marcus at marcuscom.com>
Cc:  
Subject: Re: kern/130628: [nfs] NFS / rpc.lockd deadlock on 7.1-R
Date: Wed, 14 Oct 2009 10:31:45 -0400

 --000e0cd6c8b6adc3e40475e605f0
 Content-Type: text/plain; charset=ISO-8859-1
 
 The patch which helped, but did not entirely fix the lock is not in 7.2-p4,
 i386.
 
 Furthermore, we now have a deadlock on an NFS mount between a free bsd
 7.2-p3 and a Linux 2.6.18-164.el5 SMP i686 athlon i386,
 
 in this situation there is a  cisco ASA 5220 between linux and freebsd
 boxes, and we run tcp nfs.
 
 
 
 On Thu, Sep 3, 2009 at 2:40 PM, Burt Rosenberg <burt at cs.miami.edu> wrote:
 
 > It seems that :
 >
 > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/130628
 >
 > appears in 7.2-R-p3; With this kernel, against Fedora 8 distros:
 >
 > Linux prism09.cs.miami.edu 2.6.26.8-57.fc8 #1 SMP Thu Dec 18 18:59:49 EST
 > 2008 x86_64 x86_64 x86_64 GNU/Linux
 >
 > which are using NFS (tcp) to mount homedirs form the freebsd server to the
 > fedora client,
 > server will become unresponsive from the network during graphical login of
 > a client.
 >
 > Applying the patch given in the article
 > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/130628 seems at present to
 > fix the problem. Under a 7.2-R-p3, we can manifest the problem in a few
 > minutes, and under said kernel with patches as described in the article, and
 > as provided by diffs against the current source, we have not yet seen the
 > problem.
 >
 > When the problem appears, the sever cannot be pinged, an other network
 > connections are halted.
 >
 > On the server, for instance, top shows:
 >
 > Proc, state, pri
 > --------------------
 > pc.lockd   *tcpin   -68
 > nfsd          -       4
 > rpcbind     select   44
 > ntpd        select   44
 > nfsd        select   44
 > ... etc...
 >
 >
 > Also,
 >
 > ./lockd restart
 > Stopping lockd.
 > Waiting for PIDS: 1114, 1114, 1114, 1114,....
 >
 > kill -9 1114 also ineffective.
 >
 > So it seems to be something spinning in lockd.
 >
 > I think this is a serious issue and would like to see it resolved. Our
 > setup is available if you would like to send instrumented code. I attach
 > diffs.
 >
 >
 >
 >
 
 --000e0cd6c8b6adc3e40475e605f0
 Content-Type: text/html; charset=ISO-8859-1
 Content-Transfer-Encoding: quoted-printable
 
 The patch which helped, but did not entirely fix the lock is not in 7.2-p4,=
  i386.<br><br>Furthermore, we now have a deadlock on an NFS mount between a=
  free bsd 7.2-p3 and a Linux  2.6.18-164.el5 SMP i686 athlon i386, <br><br>
 in this situation there is a=A0 cisco ASA 5220 between linux and freebsd bo=
 xes, and we run tcp nfs.<br><br><br><br><div class=3D"gmail_quote">On Thu, =
 Sep 3, 2009 at 2:40 PM, Burt Rosenberg <span dir=3D"ltr">&lt;<a href=3D"mai=
 lto:burt at cs.miami.edu">burt at cs.miami.edu</a>&gt;</span> wrote:<br>
 <blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">It seems that :<b=
 r>=A0<br> <a href=3D"http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/1306=
 28" target=3D"_blank">http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/130=
 628</a><br>
 <br>appears in 7.2-R-p3; With this kernel, against Fedora 8 distros:<br>
 <br>Linux <a href=3D"http://prism09.cs.miami.edu/" target=3D"_blank">prism0=
 9.cs.miami.edu</a> 2.6.26.8-57.fc8 #1 SMP Thu Dec 18 18:59:49 EST 2008 x86_=
 64 x86_64 x86_64 GNU/Linux<br><br>which are using NFS (tcp) to mount homedi=
 rs form the freebsd server to the fedora client, <br>
 
 server will become unresponsive from the network during graphical login of =
 a client.<br><br>Applying the patch given in the article <a href=3D"http://=
 www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/130628" target=3D"_blank">http:/=
 /www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/130628</a> seems at present to =
 fix the problem. Under a 7.2-R-p3, we can manifest the problem in a few min=
 utes, and under said kernel with patches as described in the article, and a=
 s provided by diffs against the current source, we have not yet seen the pr=
 oblem.<br>
 
 <br>When the problem appears, the sever cannot be pinged, an other network =
 connections are halted. <br><br>On the server, for instance, top shows:<br>=
 <br style=3D"font-family: courier new,monospace;"><span style=3D"font-famil=
 y: courier new,monospace;">Proc, state, pri</span><br style=3D"font-family:=
  courier new,monospace;">
 
 <span style=3D"font-family: courier new,monospace;">--------------------</s=
 pan><br style=3D"font-family: courier new,monospace;"><span style=3D"font-f=
 amily: courier new,monospace;">pc.lockd=A0=A0 *tcpin=A0=A0 -68 </span><br s=
 tyle=3D"font-family: courier new,monospace;">
 
 <span style=3D"font-family: courier new,monospace;">nfsd=A0=A0=A0=A0=A0=A0=
 =A0=A0=A0 -=A0=A0=A0=A0=A0=A0 4</span><br style=3D"font-family: courier new=
 ,monospace;"><span style=3D"font-family: courier new,monospace;">rpcbind=A0=
 =A0=A0=A0 select=A0=A0 44</span><br style=3D"font-family: courier new,monos=
 pace;">
 
 <span style=3D"font-family: courier new,monospace;">ntpd=A0=A0=A0=A0=A0=A0=
 =A0 select=A0=A0 44</span><br style=3D"font-family: courier new,monospace;"=
 ><span style=3D"font-family: courier new,monospace;">nfsd=A0=A0=A0=A0=A0=A0=
 =A0 select=A0=A0 44</span><br style=3D"font-family: courier new,monospace;"=
 >
 
 <span style=3D"font-family: courier new,monospace;">... etc...</span><br><b=
 r><br>Also,<br><br><span style=3D"font-family: courier new,monospace;">./lo=
 ckd restart</span><br style=3D"font-family: courier new,monospace;"><span s=
 tyle=3D"font-family: courier new,monospace;">Stopping lockd.</span><br styl=
 e=3D"font-family: courier new,monospace;">
 
 <span style=3D"font-family: courier new,monospace;">Waiting for PIDS: 1114,=
  1114, 1114, 1114,....</span><br style=3D"font-family: courier new,monospac=
 e;"><br>kill -9 1114 also ineffective.<br><br>So it seems to be something s=
 pinning in lockd.<br>
 
 <br>I think this is a serious issue and would like to see it resolved. Our =
 setup is available if you would like to send instrumented code. I attach di=
 ffs.<br><br><br><br>
 </blockquote></div><br>
 
 --000e0cd6c8b6adc3e40475e605f0--


More information about the freebsd-net mailing list