NFS 75 second stall

alan bryan alan.bryan at yahoo.com
Thu Jul 1 20:18:19 UTC 2010



--- On Thu, 7/1/10, Garrett Cooper <yanefbsd at gmail.com> wrote:

> From: Garrett Cooper <yanefbsd at gmail.com>
> Subject: Re: NFS 75 second stall
> To: "alan bryan" <alan.bryan at yahoo.com>
> Cc: freebsd-stable at freebsd.org
> Date: Thursday, July 1, 2010, 12:23 PM
> On Thu, Jul 1, 2010 at 11:51 AM, alan
> bryan <alan.bryan at yahoo.com>
> wrote:
> >
> >
> > --- On Thu, 7/1/10, Garrett Cooper <yanefbsd at gmail.com>
> wrote:
> >
> >> From: Garrett Cooper <yanefbsd at gmail.com>
> >> Subject: Re: NFS 75 second stall
> >> To: "alan bryan" <alan.bryan at yahoo.com>
> >> Cc: freebsd-stable at freebsd.org
> >> Date: Thursday, July 1, 2010, 11:13 AM
> >> On Thu, Jul 1, 2010 at 11:01 AM, alan
> >> bryan <alan.bryan at yahoo.com>
> >> wrote:
> >> > Setup:
> >> >
> >> > server - FreeBSD 8-stable from today.  2 UFS
> dirs
> >> exported via NFS.
> >> > client - FreeBSD 8.0-Release.  Running a
> test php
> >> script that copies around various files to/from 2
> separate
> >> NFS mounts.
> >> >
> >> > Situation:
> >> >
> >> > script is started (forked to do 20
> simultaneous runs)
> >> and 20 1GB files are copied to the NFS dir which
> works
> >> fine.  When it then switches to reading those
> files back
> >> and simultaneously writing to the other NFS mount
> I see a
> >> hang of 75 seconds.  If I do an "ls -l" on the
> NFS mount it
> >> hangs too.  After 75 seconds the client has
> reported:
> >> >
> >> > nfs server 192.168.10.133:/usr/local/export1:
> not
> >> responding
> >> > nfs server 192.168.10.133:/usr/local/export1:
> is alive
> >> again
> >> > nfs server 192.168.10.133:/usr/local/export1:
> not
> >> responding
> >> > nfs server 192.168.10.133:/usr/local/export1:
> is alive
> >> again
> >> >
> >> > and then things start working again.  The
> server was
> >> originally FreeBSD 8.0-Release also but was
> upgraded to the
> >> latest stable to see if this issue could be
> avoided.
> >> >
> >> > # nfsstat -s -W -w 1
> >> >  GtAttr Lookup Rdlink   Read  Write
> Rename
> >> Access  Rddir
> >> >       0      0      0    222   
> 257
> >>   0      0      0
> >> >       0      0      0    178   
> 135
> >>   0      0      0
> >> >       0      0      0     85 
>   127
> >>     0      0      0
> >> >       0      0      0      0   
>   0
> >>     0      0      0
> >> >       0      0      0      0   
>   0
> >>     0      0      0
> >> >       0      0      0      0   
>   0
> >>     0      0      0
> >> >       0      0      0      0   
>   0
> >>     0      0      0
> >> >       0      0      0      0   
>   0
> >>     0      0      0
> >> >
> >> > ... for 75 rows of all zeros
> >> >
> >> >       0      0      0    272   
> 266
> >>   0      0      0
> >> >       0      0      0    167   
> 165
> >>   0      0      0
> >> >
> >> > I also tried runs with 15 simultaneous
> processes and
> >> 25.  15 processes gave only about a 5 second
> stall but 25
> >> gave again the same 75 second stall.
> >> >
> >> > Further, I tested with 2 mounts to the same
> server but
> >> from ZFS filesytems with the exact same
> stall/timeout
> >> periods.  So, it doesn't appear to matter what
> the
> >> underlying filesystem is - it's something in NFS
> or
> >> networking code.
> >> >
> >> > Any ideas on what's going on here?  What's
> causing
> >> the complete stall period of zero NFS activity?  
> Any flaws
> >> with my testing methods?
> >> >
> >> > Thanks for any and all help/ideas.
> >>
> >> What network driver are you using? Have you tried
> >> tcpdumping the packets?
> >> -Garrett
> >>
> >
> > I'm using igb currently but have also used em.  I
> have not tried tcpdumping the packets yet on this test.
>  Any suggestions on things to look out for (I'm not that
> familiar with that whole process).
> >
> > Which brings up another point - I'm using TCP
> connections for NFS, not UDP.
> 
>     Is the net.inet.tcp.tso sysctl enabled or
> not? What about rxcsum and txcsum?
> Thanks,
> -Garrett
> 

I haven't intentionally/explicitly set any of this so it's "default":

# sysctl net.inet.tcp.tso
net.inet.tcp.tso: 1


igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=13b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,TSO4>
	ether 00:30:48:c3:26:94
	inet 192.168.10.133 netmask 0xffffff00 broadcast 192.168.10.255
	media: Ethernet autoselect (1000baseT <full-duplex>)
	status: active

Thanks,
Alan




      


More information about the freebsd-stable mailing list