Re: 100Gb performance
- Reply: Rick Macklem : "Re: 100Gb performance"
- In reply to: Rodney W. Grimes: "Re: 100Gb performance"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 23 Jun 2025 14:36:32 UTC
On Sat, Jun 21, 2025 at 9:31 PM Rodney W. Grimes < freebsd-rwg@gndrsh.dnsmgr.net> wrote: > Have the TCP inflight buffers been tuned for the BDP of this link? > > When calculating delay do not just assume the one way link delay > as what you really need is the delay until the sender gets to > processing an ack. > > I would be interested in the values of the following, and > the effect of doubling to quadrupling them has on the > performance. (I believe I have shown the defaults) > > kern.ipc.maxsockbuf=2097152 > net.inet.tcp.recvbuf_max=2097152 > net.inet.tcp.sendbuf_max=2097152 > net.inet.tcp.recvspace=65536 > net.inet.tcp.sendspace=32768 > > I understand you can't modfy the server side, but I think you > can extract the values being used there. > > What is the resulting Window Scale being used on the TCP TWH? > > > On Thu, Jun 19, 2025 at 2:34?PM Olivier Cochard-Labb? > > <olivier@freebsd.org> wrote: > > > > > > > > > > > > On Thu, Jun 19, 2025 at 4:31?PM Rick Macklem <rick.macklem@gmail.com> > wrote: > > > > > >> > > >> There is the "nconnect" mount option. It might help here. > > >> > > > > > > Interesting! > > > > > > Let?s try: > > > > > > Server side: > > > ``` > > > mkdir /tmp/nfs > > > mount -t tmpfs tmpfs /tmp/nfs > > > chmod 777 /tmp/nfs/ > > > cat > /etc/exports <<EOF > > > V4: /tmp > > > /tmp/nfs -network 1.1.1.0/24 > > > EOF > > > sysrc nfs_server_enable=YES > > > sysrc nfsv4_server_enable=YES > > > sysrc nfsv4_server_only=YES > > > service nfsd start > > > ``` > > > > > > Client side: > > > ``` > > > mkdir /tmp/nfs > > > sysrc nfs_client_enable=YES > > > service nfsclient start > > > ``` > > > > > > Now testing standard speed: > > > ``` > > > # mount -t nfs -o noatime,nfsv4 1.1.1.30:/nfs /tmp/nfs/ > > > # netstat -an -f inet -p tcp | grep 2049 | wc -l > > > 1 > > > # dd if=/dev/zero of=/tmp/nfs/test bs=1G count=10 > > > 10+0 records in > > > 10+0 records out > > > 10737418240 bytes transferred in 8.526794 secs (1259256159 bytes/sec) > > > # rm /tmp/nfs/test > > > # umount /tmp/nfs > > > ``` > > > > > > And with nconnect=16: > > > ``` > > > # mount -t nfs -o noatime,nfsv4,nconnect=16 1.1.1.30:/nfs /tmp/nfs/ > > > # dd if=/dev/zero of=/tmp/nfs/test bs=1G count=10 > > > 10+0 records in > > > 10+0 records out > > > 10737418240 bytes transferred in 8.633871 secs (1243638980 bytes/sec) > > > # rm /tmp/nfs/test > > > # netstat -an -f inet -p tcp | grep 2049 | wc -l > > > 16 > > > ``` > > > > > > => No difference here, but 16 output queues were correctly used with > nconnect=16. > > > How is load-sharing done with NFS nconnect ? > > > I?ve tested with benchmarks/fio using parallel jobs and I don?t see > any improvement too. > > Here's a few other things you can try.. > > On the server: > > - add nfs_server_maxio=1048576 to /etc/rc.conf. > > > > On the client: > > - put vfs.maxbcachebuf=1048576 in /boot/loader.conf > > - use "wcommitsize=<some large value>" as an additional mount option. > > > > On both client and server, bump kern.ipc.maxsockbuf up a bunch. > > > > Once you do the mount do > > # nfsstat -m > > on the client and you should see the rsize/wsize set to 1048576 > > and a large value for wcommitsize > > > > For reading, you should also use "readahead=8" as a mount option. > > > > Also, if you can turn down (or turn off) interrupt moderation on the > > NIC driver, try that. (Interrupt moderation is great for data streaming > > in one direction but is not so good for NFS, which consists of > bidirectional > > traffic of mostly small RPC messages. Every Write gets a small reply > message > > in the server->client direction to complete the Write and delay > processing > > these small received messages will slow NFS down.) > > > > rick > > > > > > > > Regards, > > > Olivier > > > > > > > > -- > Rod Grimes > rgrimes@freebsd.org > Ok I was thinking about this over the weekend and I suspect we would still need concurrency in what is running on top of the NFS mount with nconnect . So What I was thinking would be a good test, was make -j4 buildworld in a nfs mounted /usr/src and /usr/obj with the mounts using nconnect=4 . This should make the 4 make jobs run concurrently which in turn should use the 4 nfs queues ( I am not sure what to call them ) which , should then work out to 4 nic queues servicing them and hopefully this is running on 4 free cores, in a 8 core box when the scheduler is doing it thing . Time that, and monitor the network traffic for the build time. Then repeat this entire thing using mounts that do not use nconnect . I have access to some 25G hardware , but only with vms , so I can try this but I am not exactly sure if its apples to apples. Thoughts ? -- mark saad | nonesuch@longcount.org