NFS 75 second stall

Garrett Cooper yanefbsd at gmail.com
Thu Jul 1 19:23:59 UTC 2010


On Thu, Jul 1, 2010 at 11:51 AM, alan bryan <alan.bryan at yahoo.com> wrote:
>
>
> --- On Thu, 7/1/10, Garrett Cooper <yanefbsd at gmail.com> wrote:
>
>> From: Garrett Cooper <yanefbsd at gmail.com>
>> Subject: Re: NFS 75 second stall
>> To: "alan bryan" <alan.bryan at yahoo.com>
>> Cc: freebsd-stable at freebsd.org
>> Date: Thursday, July 1, 2010, 11:13 AM
>> On Thu, Jul 1, 2010 at 11:01 AM, alan
>> bryan <alan.bryan at yahoo.com>
>> wrote:
>> > Setup:
>> >
>> > server - FreeBSD 8-stable from today.  2 UFS dirs
>> exported via NFS.
>> > client - FreeBSD 8.0-Release.  Running a test php
>> script that copies around various files to/from 2 separate
>> NFS mounts.
>> >
>> > Situation:
>> >
>> > script is started (forked to do 20 simultaneous runs)
>> and 20 1GB files are copied to the NFS dir which works
>> fine.  When it then switches to reading those files back
>> and simultaneously writing to the other NFS mount I see a
>> hang of 75 seconds.  If I do an "ls -l" on the NFS mount it
>> hangs too.  After 75 seconds the client has reported:
>> >
>> > nfs server 192.168.10.133:/usr/local/export1: not
>> responding
>> > nfs server 192.168.10.133:/usr/local/export1: is alive
>> again
>> > nfs server 192.168.10.133:/usr/local/export1: not
>> responding
>> > nfs server 192.168.10.133:/usr/local/export1: is alive
>> again
>> >
>> > and then things start working again.  The server was
>> originally FreeBSD 8.0-Release also but was upgraded to the
>> latest stable to see if this issue could be avoided.
>> >
>> > # nfsstat -s -W -w 1
>> >  GtAttr Lookup Rdlink   Read  Write Rename
>> Access  Rddir
>> >       0      0      0    222    257
>>   0      0      0
>> >       0      0      0    178    135
>>   0      0      0
>> >       0      0      0     85    127
>>     0      0      0
>> >       0      0      0      0      0
>>     0      0      0
>> >       0      0      0      0      0
>>     0      0      0
>> >       0      0      0      0      0
>>     0      0      0
>> >       0      0      0      0      0
>>     0      0      0
>> >       0      0      0      0      0
>>     0      0      0
>> >
>> > ... for 75 rows of all zeros
>> >
>> >       0      0      0    272    266
>>   0      0      0
>> >       0      0      0    167    165
>>   0      0      0
>> >
>> > I also tried runs with 15 simultaneous processes and
>> 25.  15 processes gave only about a 5 second stall but 25
>> gave again the same 75 second stall.
>> >
>> > Further, I tested with 2 mounts to the same server but
>> from ZFS filesytems with the exact same stall/timeout
>> periods.  So, it doesn't appear to matter what the
>> underlying filesystem is - it's something in NFS or
>> networking code.
>> >
>> > Any ideas on what's going on here?  What's causing
>> the complete stall period of zero NFS activity?   Any flaws
>> with my testing methods?
>> >
>> > Thanks for any and all help/ideas.
>>
>> What network driver are you using? Have you tried
>> tcpdumping the packets?
>> -Garrett
>>
>
> I'm using igb currently but have also used em.  I have not tried tcpdumping the packets yet on this test.  Any suggestions on things to look out for (I'm not that familiar with that whole process).
>
> Which brings up another point - I'm using TCP connections for NFS, not UDP.

    Is the net.inet.tcp.tso sysctl enabled or not? What about rxcsum and txcsum?
Thanks,
-Garrett


More information about the freebsd-stable mailing list