Re: 100Gb performance
- Reply: Olivier_Cochard-Labbé : "Re: 100Gb performance"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 24 Jun 2025 14:45:02 UTC
On Mon, Jun 23, 2025 at 10:03 PM Daniel Braniss <danny@cs.huji.ac.il> wrote:
>
>
>
> On 22 Jun 2025, at 4:30, Rodney W. Grimes <freebsd-rwg@gndrsh.dnsmgr.net> wrote:
>
> Have the TCP inflight buffers been tuned for the BDP of this link?
>
> When calculating delay do not just assume the one way link delay
> as what you really need is the delay until the sender gets to
> processing an ack.
>
> I would be interested in the values of the following, and
> the effect of doubling to quadrupling them has on the
> performance. (I believe I have shown the defaults)
>
> kern.ipc.maxsockbuf=2097152
> net.inet.tcp.recvbuf_max=2097152
> net.inet.tcp.sendbuf_max=2097152
> net.inet.tcp.recvspace=65536
> net.inet.tcp.sendspace=32768
>
>
> My numbers are very similar,
> I
>
> I understand you can't modfy the server side, but I think you
> can extract the values being used there.
>
> Will see what I can get out of the isilon.
>
>
> What is the resulting Window Scale being used on the TCP TWH?
>
>
> How can I see that?
>
> In any case, it’s strange that the best throughput so far, running 5 rsyncs is parallel
> was just above 1GB, 10% of the theoretical bandwidth.
> Will try soon RDMA.
The FreeBSD NFS client does not support RDMA.
Here's how I'd configure a client (assuming it's a fairly beefy system):
In /boot/loader.conf:
vfs.maxbcachebuf=1048576
In /etc/sysctl.conf:
kern.ipc.maxsockbuf=47370024 (or larger)
vfs.nfs.iodmax=64
Then I'd use these mount options (along with whatever you normally use,
except don't specify rsize, wsize since it should use whatever the server
supports):
nconnect=8,nocto,readahead=8,wcommitsize=67108864 (or larger)
To test write rate, I'd:
# dd if=/dev/zero of=<file on mount> bs=1M count=10240
for reading
# dd if=<file on mount> of=/dev/null bs=1M
(but umount/mount between the two "dd"s, so nothing is cached
in the client's buffer cache)
If you are stuck at 1.2Gbytes/sec, there's some bottleneck, but
I can't say where.
rick
ps: The newnfs threads to write-behind and read-ahead, so there
is some parallelism for the "dd".
>
> Thanks,
> Danny
>
>
> On Thu, Jun 19, 2025 at 2:34?PM Olivier Cochard-Labb?
> <olivier@freebsd.org> wrote:
>
>
>
>
> On Thu, Jun 19, 2025 at 4:31?PM Rick Macklem <rick.macklem@gmail.com> wrote:
>
>
> There is the "nconnect" mount option. It might help here.
>
>
> Interesting!
>
> Let?s try:
>
> Server side:
> ```
> mkdir /tmp/nfs
> mount -t tmpfs tmpfs /tmp/nfs
> chmod 777 /tmp/nfs/
> cat > /etc/exports <<EOF
> V4: /tmp
> /tmp/nfs -network 1.1.1.0/24
> EOF
> sysrc nfs_server_enable=YES
> sysrc nfsv4_server_enable=YES
> sysrc nfsv4_server_only=YES
> service nfsd start
> ```
>
> Client side:
> ```
> mkdir /tmp/nfs
> sysrc nfs_client_enable=YES
> service nfsclient start
> ```
>
> Now testing standard speed:
> ```
> # mount -t nfs -o noatime,nfsv4 1.1.1.30:/nfs /tmp/nfs/
> # netstat -an -f inet -p tcp | grep 2049 | wc -l
> 1
> # dd if=/dev/zero of=/tmp/nfs/test bs=1G count=10
> 10+0 records in
> 10+0 records out
> 10737418240 bytes transferred in 8.526794 secs (1259256159 bytes/sec)
> # rm /tmp/nfs/test
> # umount /tmp/nfs
> ```
>
> And with nconnect=16:
> ```
> # mount -t nfs -o noatime,nfsv4,nconnect=16 1.1.1.30:/nfs /tmp/nfs/
> # dd if=/dev/zero of=/tmp/nfs/test bs=1G count=10
> 10+0 records in
> 10+0 records out
> 10737418240 bytes transferred in 8.633871 secs (1243638980 bytes/sec)
> # rm /tmp/nfs/test
> # netstat -an -f inet -p tcp | grep 2049 | wc -l
> 16
> ```
>
> => No difference here, but 16 output queues were correctly used with nconnect=16.
> How is load-sharing done with NFS nconnect ?
> I?ve tested with benchmarks/fio using parallel jobs and I don?t see any improvement too.
>
> Here's a few other things you can try..
> On the server:
> - add nfs_server_maxio=1048576 to /etc/rc.conf.
>
> On the client:
> - put vfs.maxbcachebuf=1048576 in /boot/loader.conf
> - use "wcommitsize=<some large value>" as an additional mount option.
>
> On both client and server, bump kern.ipc.maxsockbuf up a bunch.
>
> Once you do the mount do
> # nfsstat -m
> on the client and you should see the rsize/wsize set to 1048576
> and a large value for wcommitsize
>
> For reading, you should also use "readahead=8" as a mount option.
>
> Also, if you can turn down (or turn off) interrupt moderation on the
> NIC driver, try that. (Interrupt moderation is great for data streaming
> in one direction but is not so good for NFS, which consists of bidirectional
> traffic of mostly small RPC messages. Every Write gets a small reply message
> in the server->client direction to complete the Write and delay processing
> these small received messages will slow NFS down.)
>
> rick
>
>
> Regards,
> Olivier
>
>
>
>
>
> --
> Rod Grimes rgrimes@freebsd.org
>
>