Re: 100Gb performance
- In reply to: Olivier_Cochard-Labbé : "Re: 100Gb performance"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 24 Jun 2025 16:36:36 UTC
On Tue, Jun 24, 2025 at 9:15 AM Olivier Cochard-Labbé <olivier@freebsd.org> wrote: > > > On Tue, Jun 24, 2025 at 5:56 PM Rick Macklem <rick.macklem@gmail.com> wrote: >> >> On Tue, Jun 24, 2025 at 8:47 AM Olivier Cochard-Labbé >> <olivier@freebsd.org> wrote: >> > >> > >> > On Tue, Jun 24, 2025 at 4:45 PM Rick Macklem <rick.macklem@gmail.com> wrote: >> >> >> >> >> >> Here's how I'd configure a client (assuming it's a fairly beefy system): >> >> In /boot/loader.conf: >> >> vfs.maxbcachebuf=1048576 >> >> >> >> In /etc/sysctl.conf: >> >> kern.ipc.maxsockbuf=47370024 (or larger) >> >> vfs.nfs.iodmax=64 >> >> >> >> Then I'd use these mount options (along with whatever you normally use, >> >> except don't specify rsize, wsize since it should use whatever the server >> >> supports): >> >> nconnect=8,nocto,readahead=8,wcommitsize=67108864 (or larger) >> >> >> >> To test write rate, I'd: >> >> # dd if=/dev/zero of=<file on mount> bs=1M count=10240 >> >> for reading >> >> # dd if=<file on mount> of=/dev/null bs=1M >> >> (but umount/mount between the two "dd"s, so nothing is cached >> >> in the client's buffer cache) >> >> >> >> If you are stuck at 1.2Gbytes/sec, there's some bottleneck, but >> >> I can't say where. >> >> >> >> rick >> >> ps: The newnfs threads to write-behind and read-ahead, so there >> >> is some parallelism for the "dd". >> >> >> > >> > Hi, >> > >> > Ok let’s try that all those parameters (running June 2025 stableweek) : >> > >> > On server and client, /etc/sysctl.conf configured with a: >> > kern.ipc.maxsockbuf=33554432 >> > net.inet.tcp.recvbuf_max=33554432 >> > net.inet.tcp.sendbuf_max=33554432 >> > net.inet.tcp.recvspace=1048576 >> > net.inet.tcp.sendspace=524288 >> > vfs.nfs.iodmax=64 >> > >> > Server side: >> > nfs_server_enable="YES" >> > nfsv4_server_enable="YES" >> > nfsv4_server_only="YES" >> > nfs_server_maxio="1048576" >> > With correctly applied sysctl: >> > root@server:~ # sysctl vfs.nfsd.srvmaxio >> > vfs.nfsd.srvmaxio: 1048576 >> > root@server:~ # sysctl vfs.nfs.iodmax >> > vfs.nfs.iodmax: 64 >> > >> > First, just generating the server disk speed to be used as reference: >> > root@server:~ # dd if=/dev/zero of=/tmp/nfs/data bs=1M count=20480 >> > 20480+0 records in >> > 20480+0 records out >> > 21474836480 bytes transferred in 3.477100 secs (6176076082 bytes/sec) >> > root@server:~ # units -t '6176076082 bytes' gigabit >> > 49.408609 >> > >> > So here, reaching about 40Gb/s with NFS will be the target. >> > >> > But before the NFS test, a simple iperf3 test between client and server with 16 sessions (same as with nconnect): >> > root@client:~ # iperf3 -c 1.1.1.30 --parallel 16 >> > [SUM] 0.00-10.00 sec 99.1 GBytes 85.1 Gbits/sec 81693 sender >> > >> > The 100Gb/s link is here and seems to be working fine with iperf3. >> > >> > On the client side, the NFS test now: >> > root@client:~ # mount -t nfs -o noatime,nfsv4,nconnect=16,wcommitsize=67108864,readahead=8,nocto 1.1.1.30:/nfs /tmp/nfs/ >> > root@client:~ # nfsstat -m >> > 1.1.1.30:/nfs on /tmp/nfs >> > nfsv4,minorversion=2,tcp,resvport,nconnect=16,hard,nocto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=8,wcommitsize=67108864,timeout=120,retrans=2147483647 >> > >> > => Notice here that negotiated rsize and wsize haven't improved since the bump of vfs.nfsd.srvmaxio on server side. Shouldn't those values be a lot bigger at this stage ? >> Yep. Did you reboot the client after putting >> vfs.maxbcachebuf=1048576 >> in /boot/loader.conf? >> (It's a tunable, so it needs to be set at boot time.) >> The rsize, wsize should be 1048576. >> >> rick > > > Indeed, I’ve forgot about this sysctl ! > > With the correct settings: > > root@client:~ # sysctl vfs.maxbcachebuf > vfs.maxbcachebuf: 1048576 > root@client:~ # mount -t nfs -o noatime,nfsv4,nconnect=16,wcommitsize=67108864,readahead=8,nocto 1.1.1.30:/nfs /tmp/nfs/ > root@client:~ # nfsstat -m > 1.1.1.30:/nfs on /tmp/nfs > nfsv4,minorversion=2,tcp,resvport,nconnect=16,hard,nocto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=1048576,wsize=1048576,readdirsize=1048576,readahead=8,wcommitsize=67108864,timeout=120,retrans=2147483647 > > => The negotiated rsize and wsize are now bigger and this fixes the read speed (not the write): Try bumping wcommitsize up a a bunch. That's how much writing it can do at a time. rick > > root@client:~ # dd if=/dev/zero of=/tmp/nfs/data bs=1M count=20480 > 20480+0 records in > 20480+0 records out > 21474836480 bytes transferred in 7.574187 secs (2835266137 bytes/sec) > root@client:~ # units -t '2835266137 bytes' gigabit > 22.682129 > root@client:~ # umount /tmp/nfs/ > root@client:~ # mount -t nfs -o noatime,nfsv4,nconnect=16,wcommitsize=67108864,readahead=8,nocto 1.1.1.30:/nfs /tmp/nfs/ > root@client:~ # dd of=/dev/zero if=/tmp/nfs/data bs=1M count=20480 > 20480+0 records in > 20480+0 records out > 21474836480 bytes transferred in 4.168176 secs (5152094642 bytes/sec) > root@client:~ # units -t '5152094642 bytes' gigabit > 41.216757 > > Thanks, > Olivier