Re: 100Gb performance

In reply to: Mark Saad : "Re: 100Gb performance"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Mon, 23 Jun 2025 14:46:48 UTC
On Mon, Jun 23, 2025 at 7:37 AM Mark Saad <nonesuch@longcount.org> wrote:
>
>
>
> On Sat, Jun 21, 2025 at 9:31 PM Rodney W. Grimes <freebsd-rwg@gndrsh.dnsmgr.net> wrote:
>>
>> Have the TCP inflight buffers been tuned for the BDP of this link?
>>
>> When calculating delay do not just assume the one way link delay
>> as what you really need  is the delay until the sender gets to
>> processing an ack.
>>
>> I would be interested in the values of the following, and
>> the effect of doubling to quadrupling them has on the
>> performance.  (I believe I have shown the defaults)
>>
>> kern.ipc.maxsockbuf=2097152
>> net.inet.tcp.recvbuf_max=2097152
>> net.inet.tcp.sendbuf_max=2097152
>> net.inet.tcp.recvspace=65536
>> net.inet.tcp.sendspace=32768
>>
>> I understand you can't modfy the server side, but I think you
>> can extract the values being used there.
>>
>> What is the resulting Window Scale being used on the TCP TWH?
>>
>> > On Thu, Jun 19, 2025 at 2:34?PM Olivier Cochard-Labb?
>> > <olivier@freebsd.org> wrote:
>> > >
>> > >
>> > >
>> > > On Thu, Jun 19, 2025 at 4:31?PM Rick Macklem <rick.macklem@gmail.com> wrote:
>> > >
>> > >>
>> > >> There is the "nconnect" mount option. It might help here.
>> > >>
>> > >
>> > > Interesting!
>> > >
>> > > Let?s try:
>> > >
>> > > Server side:
>> > > ```
>> > > mkdir /tmp/nfs
>> > > mount -t tmpfs tmpfs /tmp/nfs
>> > > chmod 777 /tmp/nfs/
>> > > cat > /etc/exports <<EOF
>> > > V4: /tmp
>> > > /tmp/nfs -network 1.1.1.0/24
>> > > EOF
>> > > sysrc nfs_server_enable=YES
>> > > sysrc nfsv4_server_enable=YES
>> > > sysrc nfsv4_server_only=YES
>> > > service nfsd start
>> > > ```
>> > >
>> > > Client side:
>> > > ```
>> > > mkdir /tmp/nfs
>> > > sysrc nfs_client_enable=YES
>> > > service nfsclient start
>> > > ```
>> > >
>> > > Now testing standard speed:
>> > > ```
>> > > # mount -t nfs -o noatime,nfsv4 1.1.1.30:/nfs /tmp/nfs/
>> > > # netstat -an -f inet -p tcp | grep 2049 | wc -l
>> > >        1
>> > > # dd if=/dev/zero of=/tmp/nfs/test bs=1G count=10
>> > > 10+0 records in
>> > > 10+0 records out
>> > > 10737418240 bytes transferred in 8.526794 secs (1259256159 bytes/sec)
>> > > # rm /tmp/nfs/test
>> > > # umount /tmp/nfs
>> > > ```
>> > >
>> > > And with nconnect=16:
>> > > ```
>> > > # mount -t nfs -o noatime,nfsv4,nconnect=16 1.1.1.30:/nfs /tmp/nfs/
>> > > # dd if=/dev/zero of=/tmp/nfs/test bs=1G count=10
>> > > 10+0 records in
>> > > 10+0 records out
>> > > 10737418240 bytes transferred in 8.633871 secs (1243638980 bytes/sec)
>> > > # rm /tmp/nfs/test
>> > > # netstat -an -f inet -p tcp | grep 2049 | wc -l
>> > >       16
>> > > ```
>> > >
>> > > => No difference here, but 16 output queues were correctly used with nconnect=16.
>> > > How is load-sharing done with NFS nconnect ?
>> > > I?ve tested with benchmarks/fio using parallel jobs and I don?t see any improvement too.
>> > Here's a few other things you can try..
>> > On the server:
>> > - add nfs_server_maxio=1048576 to /etc/rc.conf.
>> >
>> > On the client:
>> > - put vfs.maxbcachebuf=1048576 in /boot/loader.conf
>> > - use "wcommitsize=<some large value>" as an additional mount option.
>> >
>> > On both client and server, bump kern.ipc.maxsockbuf up a bunch.
>> >
>> > Once you do the mount do
>> > # nfsstat -m
>> > on the client and you should see the rsize/wsize set to 1048576
>> > and a large value for wcommitsize
>> >
>> > For reading, you should also use "readahead=8" as a mount option.
>> >
>> > Also, if you can turn down (or turn off) interrupt moderation on the
>> > NIC driver, try that. (Interrupt moderation is great for data streaming
>> > in one direction but is not so good for NFS, which consists of bidirectional
>> > traffic of mostly small RPC messages. Every Write gets a small reply message
>> > in the server->client direction to complete the Write and delay processing
>> > these small received messages will slow NFS down.)
>> >
>> > rick
>> >
>> > >
>> > > Regards,
>> > > Olivier
>> >
>> >
>> >
>>
>> --
>> Rod Grimes                                                 rgrimes@freebsd.org
>
>
> Ok I was thinking about this over the weekend and I suspect we would still need concurrency in what is running on top of the NFS mount with nconnect .
> So What I was thinking would be a good test, was make -j4 buildworld in a nfs mounted /usr/src and /usr/obj with the mounts using  nconnect=4 .
> This should make the 4 make jobs run concurrently which in turn should use the 4 nfs queues ( I am not sure what to call them ) which , should then
> work out to 4 nic queues servicing them and hopefully this is running on 4 free cores, in a 8 core box when the scheduler is doing it thing .
>
> Time that, and monitor the network traffic for the build time. Then repeat this entire thing using mounts that do not use nconnect .
>
> I have access to some 25G hardware , but only with vms , so I can try this but I am not exactly sure if its apples to apples.
>
> Thoughts ?
I don't think performance of software builds over NFS has much to do with
I/O performance or network bandwidth. This load is mostly about
Getattrs (the endless cycles of checks for "has this .h file changed"
that make does).

I have recently been quite successful at improving this with the
use of delegations, which finally appear to be stable. (These
changes will be in 15.0, but I'm not sure if enough of it is in
14.n to see the improvements I am getting?)

However, I do not have any access to fast networking (I test
on 100Mbps net connection), so I might be incorrect w.r.t. the
above.

rick

>
> --
> mark saad | nonesuch@longcount.org