Re: Performance issues with vnet jails + epair + bridge

From: Sad Clouds <cryintothebluesky_at_gmail.com>
Date: Thu, 12 Sep 2024 18:50:20 UTC
On Thu, 12 Sep 2024 13:25:32 -0400
Paul Procacci <pprocacci@gmail.com> wrote:

> You need to define `poor'.
> You need to show `top -SH` while the `problem' occurs.
> 
> My guess is packets are getting shuttled between a global taskqueue thread.
> This is the default, or at least I'm not aware of this default being
> changed.
> You can try enabling `options RSS' in your kernel as this would introduce a
> taskqueue worker thread per cpu.

Hello, a bit more info. I have a tcp benchmark which performs
simultaneous send and receive between sever and client processes. This
is running on a 4-core Raspberry Pi 4 and using a total of 8 threads: 4
threads for server process and 4 threads for client process. Each
thread opens a single TCP connection.

# ifconfig genet0
genet0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        options=280009<RXCSUM,VLAN_MTU,LINKSTATE,RXCSUM_IPV6>
        ether dc:a6:32:31:71:32
        inet 192.168.1.18 netmask 0xffffff00 broadcast 192.168.1.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

Running TCP test via localhost keeps all CPUs busy and gives combined
throughput of around 900 MiB/sec.

Running TCP test between the host and jail gives combined throughput of
around 128 MiB/sec. Disabling RXCSUM and re-running the test gives the
same 128 MiB/sec.

Running "top -SH" shows this:

last pid:  4542;  load averages:  0.68,  0.42,  0.27                                            up 0+09:39:19  19:19:03
154 threads:   7 running, 126 sleeping, 21 waiting
CPU 0:  0.0% user,  0.0% nice,  100% system,  0.0% interrupt,  0.0% idle
CPU 1:  0.0% user,  0.0% nice, 15.6% system,  0.0% interrupt, 84.4% idle
CPU 2:  0.8% user,  0.0% nice, 42.2% system,  0.0% interrupt, 57.0% idle
CPU 3:  0.8% user,  0.0% nice, 23.4% system,  0.0% interrupt, 75.8% idle
Mem: 12M Active, 46M Inact, 274M Wired, 162M Buf, 3500M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
    0 root        -64    -     0B   608K CPU0     0   4:11 100.00% kernel{epair_task}
   11 root        187 ki31     0B    64K RUN      1 575:35  81.58% idle{idle: cpu1}
   11 root        187 ki31     0B    64K CPU3     3 574:58  74.14% idle{idle: cpu3}
   11 root        187 ki31     0B    64K RUN      2 576:04  60.94% idle{idle: cpu2}
 4542 root          4    0    29M  3408K select   2   0:01  11.95% tcp_bench{tcp_bench}
 4542 root         28    0    29M  3408K select   2   0:00  11.79% tcp_bench{tcp_bench}
 4542 root         32    0    29M  3408K select   3   0:01  10.85% tcp_bench{tcp_bench}
 4538 root         26    0    29M  3624K select   3   0:01  10.62% tcp_bench{tcp_bench}
 4538 root         21    0    29M  3624K CPU1     1   0:00  10.07% tcp_bench{tcp_bench}
 4538 root         26    0    29M  3624K select   1   0:01  10.05% tcp_bench{tcp_bench}
 4538 root         24    0    29M  3624K select   1   0:01   8.91% tcp_bench{tcp_bench}
 4542 root         31    0    29M  3408K select   1   0:01   8.78% tcp_bench{tcp_bench}
 4537 root         20    0    14M  3748K CPU2     2   0:00   0.24% top

I will try adding "options RSS" to GENERIC config file and then build
new kernel.