From nobody Mon Sep 20 10:24:43 2021 X-Original-To: net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id CFA3517D6F6E for ; Mon, 20 Sep 2021 10:24:43 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HCgfM5Sqpz4hFS for ; Mon, 20 Sep 2021 10:24:43 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 9862B6EA1 for ; Mon, 20 Sep 2021 10:24:43 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 18KAOh8f078776 for ; Mon, 20 Sep 2021 10:24:43 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 18KAOhdR078775 for net@FreeBSD.org; Mon, 20 Sep 2021 10:24:43 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: net@FreeBSD.org Subject: [Bug 258623] [routing] peformance - 2 numa domains vs signale numa domain Date: Mon, 20 Sep 2021 10:24:43 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 13.0-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: konrad.kreciwilk@korbank.pl X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter cc Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D258623 Bug ID: 258623 Summary: [routing] peformance - 2 numa domains vs signale numa domain Product: Base System Version: 13.0-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: konrad.kreciwilk@korbank.pl CC: net@FreeBSD.org Server: Dell R630, 2x CPU E5-2667 v4 - 2 numa domains, 64GB Ram NIC: 2x T62100-SO-CR - each connected to a separate numa domain * 2 numa domain test I use chelsio_affinity to assign irq to correct CPU cfg: ifconfig_cc0=3D"up" ifconfig_cc1=3D"up" ifconfig_cc2=3D"up" ifconfig_cc3=3D"up" #LAGG LACP ifconfig_lagg0=3D"laggproto lacp laggport cc0 laggport cc2 -wol -vlanhwtso = -tso -lro -hwrxtstmp -txtls use_flowid use_numa up" ifconfig_vlan2020=3D"vlan 2020 vlandev lagg0" ifconfig_vlan2002=3D"vlan 2002 vlandev lagg0" +--------+ +--------+ +---------+ | +---------+ +------+ | | Router | lagg0 | switch | | gen | | +---------+ +------+ | +--------+ +--------+ +---------+ I can achieve around 14Mpps without drop. Above this level, drops appear on= the ccX/lagg0 interfaces. It looks like a CPU some free resources: # netstat -i -I lagg0 1 input lagg0 output packets errs idrops bytes packets errs bytes colls 15939431 0 555822 2246265134 15381955 0 2167675870 0 16600413 0 612946 2339414686 15978803 0 2253137798 0 15259699 0 575481 2150765886 14693013 0 2070319352 0 15935269 0 512558 2245569909 15382551 0 2167518240 0 16159627 0 616404 2277463695 15563046 0 2195364136 0 14841125 0 322695 1605926868 14540305 0 1562096456 0 # top -PSH last pid: 9745; load averages: 6.46, 2.02, 0.76=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 up 0+00:02:06 20:25:17 580 threads: 25 running, 471 sleeping, 84 waiting CPU 0: 0.0% user, 0.0% nice, 0.0% system, 59.2% interrupt, 40.8% idle CPU 1: 0.0% user, 0.0% nice, 0.0% system, 57.7% interrupt, 42.3% idle CPU 2: 0.0% user, 0.0% nice, 0.0% system, 57.7% interrupt, 42.3% idle CPU 3: 0.0% user, 0.0% nice, 0.0% system, 60.6% interrupt, 39.4% idle CPU 4: 0.0% user, 0.0% nice, 0.0% system, 56.3% interrupt, 43.7% idle CPU 5: 0.0% user, 0.0% nice, 0.0% system, 62.0% interrupt, 38.0% idle CPU 6: 0.0% user, 0.0% nice, 0.0% system, 59.2% interrupt, 40.8% idle CPU 7: 0.0% user, 0.0% nice, 0.0% system, 53.5% interrupt, 46.5% idle CPU 8: 0.0% user, 0.0% nice, 1.4% system, 62.0% interrupt, 36.6% idle CPU 9: 0.0% user, 0.0% nice, 0.0% system, 67.6% interrupt, 32.4% idle CPU 10: 0.0% user, 0.0% nice, 0.0% system, 69.0% interrupt, 31.0% idle CPU 11: 0.0% user, 0.0% nice, 0.0% system, 66.2% interrupt, 33.8% idle CPU 12: 0.0% user, 0.0% nice, 0.0% system, 63.4% interrupt, 36.6% idle CPU 13: 0.0% user, 0.0% nice, 0.0% system, 62.0% interrupt, 38.0% idle CPU 14: 0.0% user, 0.0% nice, 0.0% system, 63.4% interrupt, 36.6% idle CPU 15: 0.0% user, 0.0% nice, 0.0% system, 63.4% interrupt, 36.6% idle Mem: 536M Active, 29M Inact, 1528M Wired, 60G Free ARC: 114M Total, 22M MFU, 88M MRU, 693K Header, 3231K Other 30M Compressed, 92M Uncompressed, 3.12:1 Ratio Swap: 32G Total, 32G Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root -92 - 0B 1472K CPU8 8 0:37 60.50% intr{irq= 152: t6nex1:0a0} 12 root -92 - 0B 1472K CPU10 10 0:36 60.42% intr{irq= 154: t6nex1:0a2} 12 root -92 - 0B 1472K CPU11 11 0:36 60.27% intr{irq= 155: t6nex1:0a3} 12 root -92 - 0B 1472K CPU14 14 0:36 60.26% intr{irq= 158: t6nex1:0a6} 12 root -92 - 0B 1472K CPU12 12 0:36 60.24% intr{irq= 156: t6nex1:0a4} 12 root -92 - 0B 1472K CPU9 9 0:36 60.15% intr{irq= 153: t6nex1:0a1} 12 root -92 - 0B 1472K CPU13 13 0:36 59.88% intr{irq= 157: t6nex1:0a5} 12 root -92 - 0B 1472K CPU15 15 0:36 59.41% intr{irq= 159: t6nex1:0a7} 12 root -92 - 0B 1472K WAIT 0 0:37 58.49% intr{irq= 98: t6nex0:0a0} 12 root -92 - 0B 1472K WAIT 1 0:37 57.89% intr{irq= 99: t6nex0:0a1} 12 root -92 - 0B 1472K WAIT 4 0:37 57.39% intr{irq= 102: t6nex0:0a4} 12 root -92 - 0B 1472K WAIT 5 0:36 57.35% intr{irq= 103: t6nex0:0a5} 12 root -92 - 0B 1472K WAIT 3 0:36 57.32% intr{irq= 101: t6nex0:0a3} 12 root -92 - 0B 1472K WAIT 6 0:36 57.12% intr{irq= 104: t6nex0:0a6} 12 root -92 - 0B 1472K WAIT 2 0:36 56.98% intr{irq= 100: t6nex0:0a2} 12 root -92 - 0B 1472K WAIT 7 0:36 56.85% intr{irq= 105: t6nex0:0a7} # pcm-numa.x Time elapsed: 1064 ms Core | IPC | Instructions | Cycles | Local DRAM accesses | Remote DRAM Accesses 0 1.31 4195 M 3203 M 3382 K 48 K 1 1.32 4211 M 3199 M 3241 K 27 K 2 1.33 4238 M 3196 M 3146 K 48 K 3 1.33 4238 M 3197 M 3143 K 26 K 4 1.32 4228 M 3197 M 3241 K 47 K 5 1.33 4243 M 3198 M 3046 K 29 K 6 1.33 4247 M 3195 M 3169 K 47 K 7 1.33 4264 M 3196 M 3180 K 20 K 8 1.29 4159 M 3224 M 2948 K 77 K 9 1.29 4172 M 3224 M 2865 K 92 K 10 1.29 4199 M 3247 M 3263 K 76 K 11 1.30 4237 M 3259 M 2892 K 91 K 12 1.30 4261 M 3274 M 3069 K 73 K 13 1.30 4231 M 3246 M 2959 K 104 K 14 1.30 4291 M 3291 M 3353 K 74 K 15 1.31 4221 M 3227 M 3008 K 85 K pmcstat-S cpu_clk_unhalted.thread flamegraph - https://files.fm/u/enhy23ffr -------------------- * single domain test In this scenario I create vlans on single cc0 (use one numa domian) ifconfig_vlan2020=3D"vlan 2020 vlandev cc0" ifconfig_vlan2002=3D"vlan 2002 vlandev cc0" +--------+ +--------+ +---------+ | +---------+ +------+ | | Router | cc0 | switch | | gen | | | | +------+ | +--------+ +--------+ +---------+ Using cc0 I can achieve 16Mpps without drops: # netstat -i -I cc0 1 input cc0 output packets errs idrops bytes packets errs bytes colls 15934346 0 0 2245565269 15933728 0 2245477291 0 15927621 0 0 2244617740 15928235 0 2244704202 0 15934688 0 0 2245613662 15934213 0 2245546449 0 15931155 0 0 2245115588 15931208 0 2245120654 0 15926995 0 0 2244529583 15927391 0 2244585093 0 15931114 0 0 2245109534 15931145 0 2245115823 0 # top -PSH last pid: 9976; load averages: 6.57, 2.51, 1.00=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 up 0+00:03:23 20:16:17 579 threads: 25 running, 470 sleeping, 84 waiting CPU 0: 0.0% user, 0.0% nice, 0.0% system, 95.4% interrupt, 4.6% idle CPU 1: 0.0% user, 0.0% nice, 0.0% system, 95.4% interrupt, 4.6% idle CPU 2: 0.0% user, 0.0% nice, 0.0% system, 94.7% interrupt, 5.3% idle CPU 3: 0.0% user, 0.0% nice, 0.0% system, 93.9% interrupt, 6.1% idle CPU 4: 0.0% user, 0.0% nice, 0.0% system, 94.7% interrupt, 5.3% idle CPU 5: 0.0% user, 0.0% nice, 0.0% system, 94.7% interrupt, 5.3% idle CPU 6: 0.0% user, 0.0% nice, 0.0% system, 94.7% interrupt, 5.3% idle CPU 7: 0.0% user, 0.0% nice, 0.0% system, 93.1% interrupt, 6.9% idle CPU 8: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 9: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 10: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 11: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 12: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 13: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 14: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 15: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 537M Active, 30M Inact, 1260M Wired, 60G Free ARC: 115M Total, 22M MFU, 89M MRU, 695K Header, 3260K Other 30M Compressed, 93M Uncompressed, 3.10:1 Ratio Swap: 32G Total, 32G Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root -92 - 0B 1472K CPU3 3 1:50 94.86% intr{irq= 101: t6nex0:0a3} 12 root -92 - 0B 1472K CPU1 1 1:49 94.68% intr{irq= 99: t6nex0:0a1} 12 root -92 - 0B 1472K CPU5 5 1:49 94.40% intr{irq= 103: t6nex0:0a5} 12 root -92 - 0B 1472K CPU7 7 1:49 94.18% intr{irq= 105: t6nex0:0a7} 12 root -92 - 0B 1472K CPU0 0 1:49 94.13% intr{irq= 98: t6nex0:0a0} 12 root -92 - 0B 1472K CPU6 6 1:49 94.11% intr{irq= 104: t6nex0:0a6} 12 root -92 - 0B 1472K CPU4 4 1:49 93.81% intr{irq= 102: t6nex0:0a4} 12 root -92 - 0B 1472K CPU2 2 1:48 93.56% intr{irq= 100: t6nex0:0a2} # pcm-numa.x Time elapsed: 1002 ms Core | IPC | Instructions | Cycles | Local DRAM accesses | Remote DRAM Accesses 0 1.93 6513 M 3374 M 4179 K 34 K 1 1.93 6516 M 3374 M 4153 K 3655 2 1.94 6518 M 3352 M 4122 K 33 K 3 1.94 6516 M 3367 M 4118 K 8574 4 1.94 6517 M 3361 M 4142 K 37 K 5 1.93 6516 M 3376 M 4147 K 10 K 6 1.93 6515 M 3371 M 4154 K 39 K 7 1.94 6514 M 3360 M 4173 K 12 K 8 0.24 1833 K 7596 K 1805 1378 9 0.20 728 K 3726 K 467 502 10 0.11 312 K 2779 K 227 234 11 0.14 486 K 3407 K 291 361 12 0.12 357 K 2956 K 183 132 13 0.07 195 K 2664 K 46 119 14 0.13 381 K 3047 K 455 212 15 0.23 765 K 3310 K 325 346 --------------------------------------------------------------------- pmcstat-S cpu_clk_unhalted.thread flamegraph - https://files.fm/u/3njfz2r3g * Summary I know, lagg makes a certain amount of overhead but based on my testing a single card performs better than two cards in lagg0 . --=20 You are receiving this mail because: You are on the CC list for the bug.=