Interactions with mxge, pf, nfsd, and the kernel
Bob Healey
healer at rpi.edu
Wed Jul 2 17:16:14 UTC 2014
At the moment, I am running as patched as freebsd-update made me on 6/12/14
Bob Healey
Systems Administrator
Biocomputation and Bioinformatics Constellation
and Molecularium
healer at rpi.edu
(518) 276-4407
On 7/2/2014 12:59 PM, Adrian Chadd wrote:
> Hi,
>
> I vaguely recall some pf issues that caused the state table to not get
> flushed and things to get stuck. I think it fixed post 10.0-REL.
>
> Maybe update to 10-STABLE and see?
>
>
> -a
>
>
> On 2 July 2014 08:11, Bob Healey <healer at rpi.edu> wrote:
>> Hello.
>>
>> I've been wrestling with this on and off for a few months now. I have an
>> assortment of systems (some Dell Poweredge R515, R610, and IBM x3630M3) with
>> 10 gig Myricom ethernet cards acting as nfs servers to Linux HPC compute
>> clusters (12-36 nodes, 384 - 480 cores) connected via gigabit ethernet.
>> They are also connected to the outside world via onboard bce (Dell) or igb
>> (IBM). After a variable length of time, I will lose all network access to a
>> host. Connecting via console, the machine tends to be fully responsive. A
>> reboot clears the problem, but I have yet to figure out any
>> sysctls/loader.conf tunables to clear the problem and make it stay away. PF
>> is in use to restrict access to the host to a pair of public /24's, and to
>> 10/8. If there is a way in zfs's sharenfs property to make that
>> restriction, I'd be happy to change, but I really don't like leaving nfs
>> open to the university's quartet of /16's, so PF it is. The vlan2 interface
>> has mxge0 as its parent.
>>
>> Thanks for any help.
>>
>> This host is getting ready to crash soon, based on netstat.
>> root at husker:~ # netstat -i
>> Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs
>> Coll
>> mxge0 9000 <Link#1> 00:60:dd:44:d2:0a 6358280 262 0 4061637 0
>> 0
>> mxge0 9000 fe80::260:ddf fe80::260:ddff:fe 0 - - 2 -
>> -
>> bce0 1500 <Link#2> 08:9e:01:50:a1:ac 276391 0 0 0 0
>> 0
>> bce0 1500 fe80::a9e:1ff fe80::a9e:1ff:fe5 0 - - 3 -
>> -
>> bce1 1500 <Link#3> 08:9e:01:50:a1:ad 2229709391 16921 0
>> 1182942116 0 0
>> bce1 1500 128.113.12.0 husker 2226254093 - -
>> 1183962005 - -
>> bce1 1500 fe80::a9e:1ff fe80::a9e:1ff:fe5 0 - - 3 -
>> -
>> lo0 16384 <Link#4> 2030 0 0 2030 0
>> 0
>> lo0 16384 localhost ::1 4 - - 4 -
>> -
>> lo0 16384 fe80::1%lo0 fe80::1 0 - - 0 -
>> -
>> lo0 16384 your-net localhost 2026 - - 2026 -
>> -
>> vlan2 9000 <Link#5> 00:60:dd:44:d2:0a 4387250 0 0 3060586 0
>> 0
>> vlan2 9000 10.2.3.0 husker.galactica. 4370309 - - 3963931
>> - -
>> vlan2 9000 fe80::260:ddf fe80::260:ddff:fe 0 - - 2 -
>> -
>> vlan2 9000 <Link#6> 00:60:dd:44:d2:0a 1971034 0 0 1001061 0
>> 0
>> vlan2 9000 10.2.4.0 husker.enterprise 1700742 - - 1961891
>> - -
>> vlan2 9000 fe80::260:ddf fe80::260:ddff:fe 0 - - 4 -
>> -
>> root at husker:~ # netstat -im
>> 6157/3233/9390 mbufs in use (current/cache/total)
>> 4081/1883/5964/1018800 mbuf clusters in use (current/cache/total/max)
>> 4080/795 mbuf+clusters out of packet secondary zone in use (current/cache)
>> 0/5/5/509399 4k (page size) jumbo clusters in use (current/cache/total/max)
>> 512/23/535/150933 9k jumbo clusters in use (current/cache/total/max)
>> 0/0/0/84899 16k jumbo clusters in use (current/cache/total/max)
>> 14309K/4801K/19110K bytes allocated to network (current/cache/total)
>> 10/1883/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
>> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
>> 2/1736/0 requests for jumbo clusters denied (4k/9k/16k)
>> 0 requests for sfbufs denied
>> 0 requests for sfbufs delayed
>> 0 requests for I/O initiated by sendfile
>> root at husker:~ # uptime
>> 11:07AM up 23 days, 19:27, 1 user, load averages: 0.14, 0.17, 0.13
>> root at husker:~ # sysctl -a | grep nmb
>> kern.ipc.nmbclusters: 1018800
>> kern.ipc.nmbjumbop: 509399
>> kern.ipc.nmbjumbo9: 452799
>> kern.ipc.nmbjumbo16: 339596
>> kern.ipc.nmbufs: 6520320
>> root at husker:~ # cat /boot/loader.conf
>> zfs_load="YES"
>> amdtemp_load="YES"
>> if_mxge_load="YES"
>> mxge_ethp_z8e_load="YES"
>> mxge_eth_z8e_load="YES"
>> mxge_rss_ethp_z8e_load="YES"
>> mxge_rss_eth_z8e_load="YES"
>> vfs.zfs.arc_max="12288M"
>> root at husker:~ # cat /var/run/dmesg.boot | head -16
>> Copyright (c) 1992-2014 The FreeBSD Project.
>> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>> The Regents of the University of California. All rights reserved.
>> FreeBSD is a registered trademark of The FreeBSD Foundation.
>> FreeBSD 10.0-RELEASE-p4 #0: Tue Jun 3 13:14:57 UTC 2014
>> root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
>> FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610
>> CPU: AMD Opteron(tm) Processor 4122 (2200.07-MHz K8-class CPU)
>> Origin = "AuthenticAMD" Id = 0x100f80 Family = 0x10 Model = 0x8
>> Stepping = 0
>> Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>> Features2=0x802009<SSE3,MON,CX16,POPCNT>
>> AMD
>> Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
>> AMD
>> Features2=0x837ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT,NodeId>
>> TSC: P-state invariant
>> real memory = 17179869184 (16384 MB)
>> avail memory = 16588054528 (15819 MB)
>>
>>
>> --
>> Bob Healey
>> Systems Administrator
>> Biocomputation and Bioinformatics Constellation
>> and Molecularium
>> healer at rpi.edu
>> (518) 276-4407
>>
>> _______________________________________________
>> freebsd-stable at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
More information about the freebsd-stable
mailing list