on 9.2-stable nfs/zfs and 10g hang

Daniel Braniss danny at cs.huji.ac.il
Mon Jan 20 07:02:56 UTC 2014


On Jan 18, 2014, at 6:13 PM, Adrian Chadd <adrian at freebsd.org> wrote:

> Hi!
> 
> Please try reducing the size down to 32k but leave TSO enabled.
> 
did so, it worked ok, but took longer:
with TSO disabled:	   14834.61 real       609.29 user      1996.90 sys
with TSO + 32k:          15714.46 real       639.98 user      1828.07 sys

> It's 9.2, so there may be some bugfixes that haven't been backported
> from 10 or -HEAD. Would you be able to try a -HEAD snapshot here?
> 
ENOTIME :-).

> What's the NFS server and hosts? I saw the core.txt.16 that says
> "ix0/ix1" so I can glean the basic chipset family but which NIC in
> particular is it? What would people need to try and reproduce it?
> 
The hosts involved are Dell 720/710
the 10G card  are Intel 

ix0 at pci0:5:0:0: class=0x020000 card=0x7a118086 chip=0x10fb8086 rev=0x01 hdr=0x00
   vendor     = 'Intel Corporation'
   device     = '82599EB 10-Gigabit SFI/SFP+ Network Connection'
   class      = network
   subclass   = ethernet

the server is exporting a big ZFS file system, which is served via 2 raid controllers:

mfi1 at pci0:65:0:0:       class=0x010400 card=0x1f2d1028 chip=0x005b1000 rev=0x05 hdr=0x00
    vendor     = 'LSI Logic / Symbios Logic'
    device     = 'MegaRAID SAS 2208 [Thunderbolt]'
    class      = mass storage
    subclass   = RAID
mfi2 at pci0:66:0:0:       class=0x010400 card=0x1f151028 chip=0x00791000 rev=0x05 hdr=0x00
    vendor     = 'LSI Logic / Symbios Logic'
    device     = 'MegaRAID SAS 2108 [Liberator]'
    class      = mass storage
    subclass   = RAID

- just had the driver card lying around-

I will try a divergent client, which has a Broadcom Nic later.

Q: is the TSO bug in the NIC/driver or in the kernel or both?

cheers
	danny




> 
> -a
> 
> 
> On 18 January 2014 03:24, Daniel Braniss <danny at cs.huji.ac.il> wrote:
>> 
>> On Jan 17, 2014, at 4:47 PM, Rick Macklem <rmacklem at uoguelph.ca> wrote:
>> 
>>> Daniel Braniss wrote:
>>>> hi all,
>>>> 
>>>> All was going ok till I decided to connect this host via a 10g nic
>>>> and very soon it started
>>>> to hang. Running multiple make buildworlds from other hosts connected
>>>> via 10g and
>>>> using both src and obj on the server via tcp/nfs did ok. but running
>>>>     find … -exec md5 {} + (the find finds over 6M files)
>>>> from another host (at 10g) will hang it very quickly.
>>>> 
>>>> If I wait a while (can’t be more specific) it sometimes recovers -
>>>> but my users are not very
>>>> patient :-)
>>>> 
>>> This suggests that an RPC request/reply gets dropped in a way that TCP
>>> doesn't recover. Eventually (after up to about 15min, I think?) the TCP
>>> connection will be shut down and a new TCP connection started, with a
>>> retry of outstanding RPCs.
>>> 
>>>> I will soon try the same experiment using the old 1G nic, but in the
>>>> meantime, if someone
>>>> could shed some light would be very helpful
>>>> 
>>>> I’m attaching core.txt, but if it doesn’t make it, it’s also
>>>> available at:
>>>>     ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt.16
>>>> 
>>> You might try disabling TSO on the net interface. There are been issues
>>> with TSO for segments around 64K in the past (or use rsize=32768,wsize=32768
>>> options on the client mount, to avoid RPCs over about 32K in size).
>>> 
>> BINGO! disabling tso did it. I’ll try reducing the packet size later.
>> some numbers:
>> there where some 7*10^6 files
>> doing it locally (the find + md5) took about 3hs,
>> via nfs at 1g took 11 hrs.
>> at 10g it took 4 hrs.
>> 
>> thanks!
>>        danny
>> 
>> 
>>> Beyond that, capturing a packet trace for the case that hangs easily and
>>> looking at what goes on near the end of it in wireshark might give you
>>> a hint about what is going on.
>>> 
>>> rick
>>> 
>>>> thanks,
>>>>     danny
>>>> _______________________________________________
>>>> freebsd-stable at freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>>> To unsubscribe, send any mail to
>>>> "freebsd-stable-unsubscribe at freebsd.org"
>>>> 
>> 
>> _______________________________________________
>> freebsd-stable at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"



More information about the freebsd-stable mailing list