NFS 75 second stall

Rick Macklem rmacklem at uoguelph.ca
Wed Sep 1 16:05:49 UTC 2010


> On 07/01/10 15:23, Garrett Cooper wrote:
> > On Thu, Jul 1, 2010 at 11:51 AM, alan bryan<alan.bryan at yahoo.com>
> > wrote:
> >>
> >> --- On Thu, 7/1/10, Garrett Cooper<yanefbsd at gmail.com> wrote:
> >>
> >>> From: Garrett Cooper<yanefbsd at gmail.com>
> >>> Subject: Re: NFS 75 second stall
> >>> To: "alan bryan"<alan.bryan at yahoo.com>
> >>> Cc: freebsd-stable at freebsd.org
> >>> Date: Thursday, July 1, 2010, 11:13 AM
> >>> On Thu, Jul 1, 2010 at 11:01 AM, alan
> >>> bryan<alan.bryan at yahoo.com>
> >>> wrote:
> >>>> Setup:
> >>>>
> >>>> server - FreeBSD 8-stable from today. 2 UFS dirs
> >>> exported via NFS.
> >>>> client - FreeBSD 8.0-Release. Running a test php
> >>> script that copies around various files to/from 2 separate
> >>> NFS mounts.
> >>>> Situation:
> >>>>
> >>>> script is started (forked to do 20 simultaneous runs)
> >>> and 20 1GB files are copied to the NFS dir which works
> >>> fine. When it then switches to reading those files back
> >>> and simultaneously writing to the other NFS mount I see a
> >>> hang of 75 seconds. If I do an "ls -l" on the NFS mount it
> >>> hangs too. After 75 seconds the client has reported:
> >>>> nfs server 192.168.10.133:/usr/local/export1: not
> >>> responding
> >>>> nfs server 192.168.10.133:/usr/local/export1: is alive
> >>> again
> >>>> nfs server 192.168.10.133:/usr/local/export1: not
> >>> responding
> >>>> nfs server 192.168.10.133:/usr/local/export1: is alive
> >>> again
> >>>> and then things start working again. The server was
> >>> originally FreeBSD 8.0-Release also but was upgraded to the
> >>> latest stable to see if this issue could be avoided.
> >>>> # nfsstat -s -W -w 1
> >>>>   GtAttr Lookup Rdlink Read Write Rename
> >>> Access Rddir
> >>>>        0 0 0 222 257
> >>>    0 0 0
> >>>>        0 0 0 178 135
> >>>    0 0 0
> >>>>        0 0 0 85 127
> >>>      0 0 0
> >>>>        0 0 0 0 0
> >>>      0 0 0
> >>>>        0 0 0 0 0
> >>>      0 0 0
> >>>>        0 0 0 0 0
> >>>      0 0 0
> >>>>        0 0 0 0 0
> >>>      0 0 0
> >>>>        0 0 0 0 0
> >>>      0 0 0
> >>>> ... for 75 rows of all zeros
> >>>>
> >>>>        0 0 0 272 266
> >>>    0 0 0
> >>>>        0 0 0 167 165
> >>>    0 0 0
> >>>> I also tried runs with 15 simultaneous processes and
> >>> 25. 15 processes gave only about a 5 second stall but 25
> >>> gave again the same 75 second stall.
> >>>> Further, I tested with 2 mounts to the same server but
> >>> from ZFS filesytems with the exact same stall/timeout
> >>> periods. So, it doesn't appear to matter what the
> >>> underlying filesystem is - it's something in NFS or
> >>> networking code.
> >>>> Any ideas on what's going on here? What's causing
> >>> the complete stall period of zero NFS activity? Any flaws
> >>> with my testing methods?
> >>>> Thanks for any and all help/ideas.
> >>> What network driver are you using? Have you tried
> >>> tcpdumping the packets?
> >>> -Garrett
> >>>
> >> I'm using igb currently but have also used em. I have not tried
> >> tcpdumping the packets yet on this test. Any suggestions on things
> >> to look out for (I'm not that familiar with that whole process).
> >>
> >> Which brings up another point - I'm using TCP connections for NFS,
> >> not UDP.
> >      Is the net.inet.tcp.tso sysctl enabled or not? What about
> >      rxcsum and txcsum?
> > Thanks,
> > -Garrett
> 
> We're occaisionally seeing these same types of stalls (+ repeated "is
> not responding" "is alive again" messages in quick succession). We're
> seeing it only on our 8.1-RELEASE systems against a variety of NFS
> servers (6.3-RELEASE, 7.2-RELEASE, and 8-STABLE from before the
> release
> of 8.1). We also see it happen with a variety of client hardware and
> network adapters (em, bce, bge); the only common denominator is
> 8.1-RELEASE on the clients.
> 
You could try the attached patch. It won't fix anything, but it
should print out what the errno is that is causing a TCP reconnect
and might give us a hint w.r.t. what is going on.

rick

-------------- next part --------------
A non-text attachment was scrubbed...
Name: clnt_rc.patch
Type: text/x-patch
Size: 395 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100901/5c613fd2/clnt_rc.bin


More information about the freebsd-stable mailing list