zfs/nfsd performance limiter

From: Adam Stylinski <kungfujesus06_at_gmail.com>
Date: Thu, 19 May 2022 00:04:14 UTC
Hello,

I have two systems connected via ConnectX-3 mellanox cards in ethernet
mode.  They have their MTU's maxed at 9000, their ring buffers maxed
at 8192, and I can hit around 36 gbps with iperf.

When using an NFS client (client = linux, server = freebsd), I see a
maximum rate of around 20gbps.  The test file is fully in ARC.  The
test is performed with an NFS mount nconnect=4 and an rsize/wsize of
1MB.

Here's the flame graph of the kernel of the system in question, with
idle stacks removed:

https://gist.github.com/KungFuJesus/918c6dcf40ae07767d5382deafab3a52#file-nfs_fg-svg

The longest functions seems like maybe it's the ERMS aware memcpy
happening from the ARC?  Is there maybe a missing fast path that could
take fewer copies into the socket buffer?