NFS 75 second stall
Steve Polyack
korvus at comcast.net
Wed Sep 1 15:56:50 UTC 2010
On 07/01/10 15:23, Garrett Cooper wrote:
> On Thu, Jul 1, 2010 at 11:51 AM, alan bryan<alan.bryan at yahoo.com> wrote:
>>
>> --- On Thu, 7/1/10, Garrett Cooper<yanefbsd at gmail.com> wrote:
>>
>>> From: Garrett Cooper<yanefbsd at gmail.com>
>>> Subject: Re: NFS 75 second stall
>>> To: "alan bryan"<alan.bryan at yahoo.com>
>>> Cc: freebsd-stable at freebsd.org
>>> Date: Thursday, July 1, 2010, 11:13 AM
>>> On Thu, Jul 1, 2010 at 11:01 AM, alan
>>> bryan<alan.bryan at yahoo.com>
>>> wrote:
>>>> Setup:
>>>>
>>>> server - FreeBSD 8-stable from today. 2 UFS dirs
>>> exported via NFS.
>>>> client - FreeBSD 8.0-Release. Running a test php
>>> script that copies around various files to/from 2 separate
>>> NFS mounts.
>>>> Situation:
>>>>
>>>> script is started (forked to do 20 simultaneous runs)
>>> and 20 1GB files are copied to the NFS dir which works
>>> fine. When it then switches to reading those files back
>>> and simultaneously writing to the other NFS mount I see a
>>> hang of 75 seconds. If I do an "ls -l" on the NFS mount it
>>> hangs too. After 75 seconds the client has reported:
>>>> nfs server 192.168.10.133:/usr/local/export1: not
>>> responding
>>>> nfs server 192.168.10.133:/usr/local/export1: is alive
>>> again
>>>> nfs server 192.168.10.133:/usr/local/export1: not
>>> responding
>>>> nfs server 192.168.10.133:/usr/local/export1: is alive
>>> again
>>>> and then things start working again. The server was
>>> originally FreeBSD 8.0-Release also but was upgraded to the
>>> latest stable to see if this issue could be avoided.
>>>> # nfsstat -s -W -w 1
>>>> GtAttr Lookup Rdlink Read Write Rename
>>> Access Rddir
>>>> 0 0 0 222 257
>>> 0 0 0
>>>> 0 0 0 178 135
>>> 0 0 0
>>>> 0 0 0 85 127
>>> 0 0 0
>>>> 0 0 0 0 0
>>> 0 0 0
>>>> 0 0 0 0 0
>>> 0 0 0
>>>> 0 0 0 0 0
>>> 0 0 0
>>>> 0 0 0 0 0
>>> 0 0 0
>>>> 0 0 0 0 0
>>> 0 0 0
>>>> ... for 75 rows of all zeros
>>>>
>>>> 0 0 0 272 266
>>> 0 0 0
>>>> 0 0 0 167 165
>>> 0 0 0
>>>> I also tried runs with 15 simultaneous processes and
>>> 25. 15 processes gave only about a 5 second stall but 25
>>> gave again the same 75 second stall.
>>>> Further, I tested with 2 mounts to the same server but
>>> from ZFS filesytems with the exact same stall/timeout
>>> periods. So, it doesn't appear to matter what the
>>> underlying filesystem is - it's something in NFS or
>>> networking code.
>>>> Any ideas on what's going on here? What's causing
>>> the complete stall period of zero NFS activity? Any flaws
>>> with my testing methods?
>>>> Thanks for any and all help/ideas.
>>> What network driver are you using? Have you tried
>>> tcpdumping the packets?
>>> -Garrett
>>>
>> I'm using igb currently but have also used em. I have not tried tcpdumping the packets yet on this test. Any suggestions on things to look out for (I'm not that familiar with that whole process).
>>
>> Which brings up another point - I'm using TCP connections for NFS, not UDP.
> Is the net.inet.tcp.tso sysctl enabled or not? What about rxcsum and txcsum?
> Thanks,
> -Garrett
We're occaisionally seeing these same types of stalls (+ repeated "is
not responding" "is alive again" messages in quick succession). We're
seeing it only on our 8.1-RELEASE systems against a variety of NFS
servers (6.3-RELEASE, 7.2-RELEASE, and 8-STABLE from before the release
of 8.1). We also see it happen with a variety of client hardware and
network adapters (em, bce, bge); the only common denominator is
8.1-RELEASE on the clients.
More information about the freebsd-stable
mailing list