on 9.2-stable nfs/zfs and 10g hang

Adrian Chadd adrian at freebsd.org
Sat Jan 18 16:13:23 UTC 2014


Hi!

Please try reducing the size down to 32k but leave TSO enabled.

It's 9.2, so there may be some bugfixes that haven't been backported
from 10 or -HEAD. Would you be able to try a -HEAD snapshot here?

What's the NFS server and hosts? I saw the core.txt.16 that says
"ix0/ix1" so I can glean the basic chipset family but which NIC in
particular is it? What would people need to try and reproduce it?


-a


On 18 January 2014 03:24, Daniel Braniss <danny at cs.huji.ac.il> wrote:
>
> On Jan 17, 2014, at 4:47 PM, Rick Macklem <rmacklem at uoguelph.ca> wrote:
>
>> Daniel Braniss wrote:
>>> hi all,
>>>
>>> All was going ok till I decided to connect this host via a 10g nic
>>> and very soon it started
>>> to hang. Running multiple make buildworlds from other hosts connected
>>> via 10g and
>>> using both src and obj on the server via tcp/nfs did ok. but running
>>>      find … -exec md5 {} + (the find finds over 6M files)
>>> from another host (at 10g) will hang it very quickly.
>>>
>>> If I wait a while (can’t be more specific) it sometimes recovers -
>>> but my users are not very
>>> patient :-)
>>>
>> This suggests that an RPC request/reply gets dropped in a way that TCP
>> doesn't recover. Eventually (after up to about 15min, I think?) the TCP
>> connection will be shut down and a new TCP connection started, with a
>> retry of outstanding RPCs.
>>
>>> I will soon try the same experiment using the old 1G nic, but in the
>>> meantime, if someone
>>> could shed some light would be very helpful
>>>
>>> I’m attaching core.txt, but if it doesn’t make it, it’s also
>>> available at:
>>>      ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt.16
>>>
>> You might try disabling TSO on the net interface. There are been issues
>> with TSO for segments around 64K in the past (or use rsize=32768,wsize=32768
>> options on the client mount, to avoid RPCs over about 32K in size).
>>
> BINGO! disabling tso did it. I’ll try reducing the packet size later.
> some numbers:
> there where some 7*10^6 files
> doing it locally (the find + md5) took about 3hs,
> via nfs at 1g took 11 hrs.
> at 10g it took 4 hrs.
>
> thanks!
>         danny
>
>
>> Beyond that, capturing a packet trace for the case that hangs easily and
>> looking at what goes on near the end of it in wireshark might give you
>> a hint about what is going on.
>>
>> rick
>>
>>> thanks,
>>>      danny
>>> _______________________________________________
>>> freebsd-stable at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>> To unsubscribe, send any mail to
>>> "freebsd-stable-unsubscribe at freebsd.org"
>>>
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"


More information about the freebsd-stable mailing list