Myrinet 10GE performance on 7.0-CURRENT

Petr Holub hopet at ics.muni.cz
Tue Oct 9 10:53:00 PDT 2007


Dear all,

I've performed inital set of experiments with FreeBSD 7.0-CURRENT
(built on Oct 8th) with Myrinet 10GE cards. Kernel is based on
GENERIC with the following options disabled:
#options        INVARIANTS
#options        INVARIANT_SUPPORT
#options        WITNESS
#options        WITNESS_SKIPSPIN
and SCHED_ULE instead of SCHED_4BSD. Userland is built using
production malloc.c (MALLOC_PRODUCTION defined in lib/libc/stdlib/malloc.c)
dmesg output is available at the end of the email (basically, 2x
dual-core Intel Xeon 5160 @ GHz, identical machines for both sending
and receiving running identical systems). The two machines are connected
point to point using LR XFPs and about 4m of fiber.


The following tunables have been set:
net.inet.tcp.sendspace: 8388608
net.inet.tcp.recvspace: 8388608
net.inet.udp.recvspace: 8388608
net.inet.raw.recvspace: 8388608
kern.ipc.maxsockbuf: 10000000
on both sender and receiver.

sender:
[root at synchro-brno ~]# iperf -c 192.168.1.1 -u -l 8500 -i 1 -t 15 -b 9G -w 2M
------------------------------------------------------------
Client connecting to 192.168.1.1, UDP port 5001
Sending 8500 byte datagrams
UDP buffer size: 2.00 MByte
------------------------------------------------------------
[  3] local 192.168.1.2 port 55844 connected with 192.168.1.1 port 5001
[  3]  0.0- 1.0 sec  1.07 GBytes  9.21 Gbits/sec
[  3]  1.0- 2.0 sec  1.07 GBytes  9.20 Gbits/sec
[  3]  2.0- 3.0 sec  1.07 GBytes  9.20 Gbits/sec
[  3]  3.0- 4.0 sec  1.07 GBytes  9.20 Gbits/sec
[  3]  4.0- 5.0 sec  1.07 GBytes  9.20 Gbits/sec
[  3]  5.0- 6.0 sec  1.07 GBytes  9.21 Gbits/sec
[  3]  6.0- 7.0 sec  1.07 GBytes  9.20 Gbits/sec
[  3]  7.0- 8.0 sec  1.07 GBytes  9.20 Gbits/sec
[  3]  8.0- 9.0 sec  1.07 GBytes  9.20 Gbits/sec
[  3]  9.0-10.0 sec  1.07 GBytes  9.20 Gbits/sec
[  3] 10.0-11.0 sec  1.07 GBytes  9.20 Gbits/sec
[  3] 11.0-12.0 sec  1.07 GBytes  9.20 Gbits/sec
[  3] 12.0-13.0 sec  1.07 GBytes  9.20 Gbits/sec
[  3] 13.0-14.0 sec  1.07 GBytes  9.21 Gbits/sec
[  3]  0.0-15.0 sec  16.1 GBytes  9.20 Gbits/sec
[  3] Sent 2030369 datagrams
[  3] Server Report:
[  3]  0.0-15.0 sec  16.1 GBytes  9.20 Gbits/sec  0.002 ms 1655/2030369 (0.082%)

receiver:
[root at synchro-plzen ~]# iperf -s -u -l 8500 -i 1
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 8500 byte datagrams
UDP buffer size: 8.00 MByte (default)
------------------------------------------------------------
[  3] local 192.168.1.1 port 5001 connected with 192.168.1.2 port 55844
[  3]  0.0- 1.0 sec  1.07 GBytes  9.21 Gbits/sec  0.004 ms    0/135463 (0%)
[  3]  1.0- 2.0 sec  1.07 GBytes  9.20 Gbits/sec  0.003 ms    0/135343 (0%)
[  3]  2.0- 3.0 sec  1.07 GBytes  9.20 Gbits/sec  0.002 ms    0/135363 (0%)
[  3]  3.0- 4.0 sec  1.07 GBytes  9.21 Gbits/sec  0.002 ms    0/135368 (0%)
[  3]  4.0- 5.0 sec  1.07 GBytes  9.20 Gbits/sec  0.003 ms    0/135337 (0%)
[  3]  5.0- 6.0 sec  1.07 GBytes  9.21 Gbits/sec  0.002 ms    0/135374 (0%)
[  3]  6.0- 7.0 sec  1.07 GBytes  9.20 Gbits/sec  0.002 ms    0/135336 (0%)
[  3]  7.0- 8.0 sec  1.07 GBytes  9.20 Gbits/sec  0.002 ms    0/135355 (0%)
[  3]  8.0- 9.0 sec  1.07 GBytes  9.20 Gbits/sec  0.002 ms    0/135306 (0%)
[  3]  9.0-10.0 sec  1.07 GBytes  9.20 Gbits/sec  0.002 ms    0/135355 (0%)
[  3] 10.0-11.0 sec  1.07 GBytes  9.20 Gbits/sec  0.003 ms    0/135329 (0%)
[  3] 11.0-12.0 sec  1.06 GBytes  9.09 Gbits/sec  0.003 ms 1655/135337 (1.2%)
[  3] 12.0-13.0 sec  1.07 GBytes  9.20 Gbits/sec  0.002 ms    0/135344 (0%)
[  3] 13.0-14.0 sec  1.07 GBytes  9.21 Gbits/sec  0.004 ms    0/135397 (0%)
[  3]  0.0-15.0 sec  16.1 GBytes  9.20 Gbits/sec  0.002 ms 1655/2030369 (0.082%)


CPU-wise, iperf takes 200% WCPU, about 36% is system time, 14% user time,
1.5% interrupt and 48.6% idle.

Sometimes, I can observe behavior, when after some time, performance drops
from >9 Gbps to about 8.7 Gbps, as shown below:

[root at synchro-brno ~]# iperf -c 192.168.1.1 -u -l 8500 -i 1 -t 60 -b 9900M -w 2M
------------------------------------------------------------
Client connecting to 192.168.1.1, UDP port 5001
Sending 8500 byte datagrams
UDP buffer size: 2.00 MByte
------------------------------------------------------------
[  3] local 192.168.1.2 port 60761 connected with 192.168.1.1 port 5001
[  3]  0.0- 1.0 sec  1.12 GBytes  9.64 Gbits/sec
[  3]  1.0- 2.0 sec  1.12 GBytes  9.63 Gbits/sec
[  3]  2.0- 3.0 sec  1.12 GBytes  9.63 Gbits/sec
[  3]  3.0- 4.0 sec  1.12 GBytes  9.63 Gbits/sec
[  3]  4.0- 5.0 sec  1.12 GBytes  9.63 Gbits/sec
[  3]  5.0- 6.0 sec  1.12 GBytes  9.64 Gbits/sec
[  3]  6.0- 7.0 sec  1.12 GBytes  9.64 Gbits/sec
[  3]  7.0- 8.0 sec  1.12 GBytes  9.64 Gbits/sec
[  3]  8.0- 9.0 sec  1.12 GBytes  9.63 Gbits/sec
[  3]  9.0-10.0 sec  1.12 GBytes  9.60 Gbits/sec
[  3] 10.0-11.0 sec  1.01 GBytes  8.71 Gbits/sec
[  3] 11.0-12.0 sec  1.01 GBytes  8.71 Gbits/sec
[  3] 12.0-13.0 sec  1.01 GBytes  8.71 Gbits/sec
[  3] 13.0-14.0 sec  1.01 GBytes  8.71 Gbits/sec
[  3] 14.0-15.0 sec  1.01 GBytes  8.71 Gbits/sec
[  3] 15.0-16.0 sec  1.01 GBytes  8.71 Gbits/sec
[  3] 16.0-17.0 sec  1.01 GBytes  8.71 Gbits/sec
[  3] 17.0-18.0 sec  1.01 GBytes  8.71 Gbits/sec
[  3] 18.0-19.0 sec  1.01 GBytes  8.71 Gbits/sec
[  3] 19.0-20.0 sec  1.01 GBytes  8.71 Gbits/sec
[  3] 20.0-21.0 sec  1.01 GBytes  8.71 Gbits/sec
[  3] 21.0-22.0 sec  1.01 GBytes  8.71 Gbits/sec
[  3]  0.0-22.9 sec  24.3 GBytes  9.11 Gbits/sec
[  3] Sent 3063929 datagrams
[  3] Server Report:
[  3]  0.0-22.9 sec  24.3 GBytes  9.11 Gbits/sec  0.003 ms    0/3063928 (0%)
[  3]  0.0-22.9 sec  1 datagrams received out-of-order

Sometimes I can get also very close to wirespeed 9.90 Gbps when
systat -ifstat 1 says about 9.97 Gbps on both sender and receiver
(without any packet loss!). However, as shown above, this
is not stable and it seems to fluctuate in longer time between
8.7, 9.6, and 9.9 Gbps (e.g. it can run on each speed for couple
of tens of seconds and then the performance changes either upwards
or downwards). As shown above, there are also sometimes some random
packet losses.

BTW, with WITNESS and INVARIANTS enabled, I can do about 2.8 Gbps
and iperf eats about 200% WCPU while about 50% time is spent in system.

I will do more testing tomorrow. If you have some ideas for further tuning
and experiments, let me know.

Petr


==================== Machine info follows ===================================

[root at synchro-brno ~]# dmesg
Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-CURRENT #1: Tue Oct  9 16:59:21 CEST 2007
    root@:/usr/obj/usr/src/sys/GENERIC
WARNING: WITNESS option enabled, expect reduced performance.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(R) CPU            5160  @ 3.00GHz (3000.12-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x6f6  Stepping = 6
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C
MOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x4e3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>
  Cores per package: 2
usable memory = 4281200640 (4082 MB)
avail memory  = 4119842816 (3928 MB)
ACPI APIC Table: <PTLTD          APIC  >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  6
 cpu3 (AP): APIC ID:  7
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
acpi0: <PTLTD   RSDT> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
cpu0: <ACPI CPU> on acpi0
est0: <Enhanced SpeedStep Frequency Control> on cpu0
p4tcc0: <CPU Frequency Thermal Control> on cpu0
cpu1: <ACPI CPU> on acpi0
est1: <Enhanced SpeedStep Frequency Control> on cpu1
p4tcc1: <CPU Frequency Thermal Control> on cpu1
cpu2: <ACPI CPU> on acpi0
est2: <Enhanced SpeedStep Frequency Control> on cpu2
p4tcc2: <CPU Frequency Thermal Control> on cpu2
cpu3: <ACPI CPU> on acpi0
est3: <Enhanced SpeedStep Frequency Control> on cpu3
p4tcc3: <CPU Frequency Thermal Control> on cpu3
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci1
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci2
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> irq 18 at device 2.0 on pci2
pci4: <ACPI PCI bus> on pcib4
em0: <Intel(R) PRO/1000 Network Connection Version - 6.5.3> port 0x2000-0x201f m
em 0xd9200000-0xd921ffff irq 18 at device 0.0 on pci4
em0: Ethernet address: 00:30:48:33:86:5e
em0: [FILTER]
em1: <Intel(R) PRO/1000 Network Connection Version - 6.5.3> port 0x2020-0x203f m
em 0xd9220000-0xd923ffff irq 19 at device 0.1 on pci4
em1: Ethernet address: 00:30:48:33:86:5f
em1: [FILTER]
pcib5: <ACPI PCI-PCI bridge> at device 0.3 on pci1
pci5: <ACPI PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> at device 4.0 on pci0
pci6: <ACPI PCI bus> on pcib6
pci6: <network, ethernet> at device 0.0 (no driver attached)
pcib7: <ACPI PCI-PCI bridge> at device 6.0 on pci0
pci7: <ACPI PCI bus> on pcib7
pci0: <base peripheral> at device 8.0 (no driver attached)
uhci0: <UHCI (generic) USB controller> port 0x1800-0x181f irq 17 at device 29.0
on pci0
uhci0: [GIANT-LOCKED]
uhci0: [ITHREAD]
usb0: <UHCI (generic) USB controller> on uhci0
usb0: USB revision 1.0
uhub0: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
uhub0: 2 ports with 2 removable, self powered
uhci1: <UHCI (generic) USB controller> port 0x1820-0x183f irq 19 at device 29.1
on pci0
uhci1: [GIANT-LOCKED]
uhci1: [ITHREAD]
usb1: <UHCI (generic) USB controller> on uhci1
usb1: USB revision 1.0
uhub1: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1
uhub1: 2 ports with 2 removable, self powered
uhci2: <UHCI (generic) USB controller> port 0x1840-0x185f irq 18 at device 29.2
on pci0
uhci2: [GIANT-LOCKED]
uhci2: [ITHREAD]
usb2: <UHCI (generic) USB controller> on uhci2
usb2: USB revision 1.0
uhub2: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb2
uhub2: 2 ports with 2 removable, self powered
uhci3: <UHCI (generic) USB controller> port 0x1860-0x187f irq 16 at device 29.3
on pci0
uhci3: [GIANT-LOCKED]
uhci3: [ITHREAD]
usb3: <UHCI (generic) USB controller> on uhci3
usb3: USB revision 1.0
uhub3: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb3
uhub3: 2 ports with 2 removable, self powered
ehci0: <EHCI (generic) USB 2.0 controller> mem 0xd9600400-0xd96007ff irq 17 at d
evice 29.7 on pci0
ehci0: [GIANT-LOCKED]
ehci0: [ITHREAD]
usb4: EHCI version 1.0
usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3
usb4: <EHCI (generic) USB 2.0 controller> on ehci0
usb4: USB revision 2.0
uhub4: <Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1> on usb4
uhub4: 8 ports with 8 removable, self powered
pcib8: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci8: <ACPI PCI bus> on pcib8
vgapci0: <VGA-compatible display> port 0x3000-0x30ff mem 0xd0000000-0xd7ffffff,0
xd9300000-0xd930ffff irq 18 at device 1.0 on pci8
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel 63XXESB2 SATA300 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,
0x376,0x1890-0x189f at device 31.2 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
acpi_button0: <Power Button> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio0: [FILTER]
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
sio1: [FILTER]
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: [FILTER]
orm0: <ISA Option ROMs> at iomem 0xc0000-0xcafff,0xcb000-0xd2fff on isa0
ppc0: cannot reserve I/O port range
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 1.000 msec
ad0: 239372MB <WDC WD2500YS-01SHB1 20.06C06> at ata0-master SATA150
ad1: 239372MB <WDC WD2500YS-01SHB1 20.06C06> at ata0-slave SATA150
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #3 Launched!
WARNING: WITNESS option enabled, expect reduced performance.
Trying to mount root from ufs:/dev/ad1s1a
mxge0: <Myri10G-PCIE-8A> mem 0xd8000000-0xd8ffffff,0xd9000000-0xd90fffff irq 16
at device 0.0 on pci6
mxge0: [ITHREAD]
mxge0: Ethernet address: 00:60:dd:47:6b:f3
mxge0: link state changed to UP
[root at synchro-brno ~]# kldstat
Id Refs Address            Size     Name
 1    8 0xffffffff80100000 b1af40   kernel
 2    1 0xffffffffb09d3000 88aa     if_mxge.ko
 3    1 0xffffffffb09dc000 a472     zlib.ko
 4    1 0xffffffffb19eb000 ca52     mxge_ethp_z8e.ko
 5    1 0xffffffffb19f9000 c8fd     mxge_eth_z8e.ko
[root at synchro-brno ~]# ifconfig mxge0
mxge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
        options=1bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,
TSO4>
        ether 00:60:dd:47:6b:f3
        inet 192.168.1.2 netmask 0xffffff00 broadcast 192.168.1.255
        media: Ethernet 10Gbase-LR (autoselect <full-duplex>)
        status: active



More information about the freebsd-performance mailing list