RFC: How to fix the NFS/iSCSI vs TSO problem

Tue Apr 1 01:43:27 UTC 2014

Marcelo Araujo wrote:
> 
> Hello Rick,
> 
> 
> We have made couple more benchmarks here with additional options,
> such like '64 threads' and readahead=8.
> 
I can't remember, but if you haven't already done so, another thing to
try are these sysctls on the server:
sysctl vfs.nfsd.tcphighwater=100000
sysctl vfs.nfsd.tcpcachetimeo=300

These should reduce the server's CPU overhead (how important these
setting are depends on how current your kernel is).

> 
> Now, we add nfsstat and netstat -m into the table.
> Here attached is the full benchmark, and I can say, this patch really
> improved the read speed.
> 
I noticed a significant reduction in CPU usage on the server (about 20%).
An interesting question would be "Is this CPU reduction a result of
avoiding the m_defrag() calls in the ix driver?".
Unfortunately, the only way I can think of answering this is doing the
benchmarks on hardware without the 32 mbuf chain limitation, but I
doubt that you can do that?

Put another way, it would be interesting to compare "with vs without"
the patch on machines where the network interface can handle 35 mbufs
in the transmit chain, so there aren't m_defrag() calls being done for
the non-patched case.

Anyhow, have fun with it, rick

> 
> I understand your concern about add more one sysctl, however maybe we
> can do something like ZFS does, if it detect the system is AMD and
> have more than X of RAM it enables some options by default, or a
> kind of warning can be displayed show the new sysctl option.
> 
> 
> Of, course other people opinion will be very welcome.
> 
> 
> Best Regards,
> 
> 
> 
> 2014-03-29 6:44 GMT+08:00 Rick Macklem < rmacklem at uoguelph.ca > :
> 
> 
> 
> 
> Marcelo Araujo wrote:
> > 2014-03-28 5:37 GMT+08:00 Rick Macklem < rmacklem at uoguelph.ca >:
> > 
> > > Christopher Forgeron wrote:
> > > > I'm quite sure the problem is on 9.2-RELEASE, not 9.1-RELEASE
> > > > or
> > > > earlier,
> > > > as a 9.2-STABLE from last year I have doesn't exhibit the
> > > > problem.
> > > > New
> > > > code in if.c at line 660 looks to be what is starting this,
> > > > which
> > > > makes me
> > > > wonder how TSO was being handled before 9.2.
> > > > 
> > > > I also like Rick's NFS patch for cluster size. I notice an
> > > > improvement, but
> > > > don't have solid numbers yet. I'm still stress testing it as we
> > > > speak.
> > > > 
> > > Unfortunately, this causes problems for small i386 systems, so I
> > > am reluctant to commit it to head. Maybe a variant that is only
> > > enabled for amd64 systems with lots of memory would be ok?
> > > 
> > > 
> > Rick,
> > 
> > Maybe you can create a SYSCTL to enable/disable it by the end user
> > will be
> > more reasonable. Also, of course, it is so far safe if only 64Bits
> > CPU can
> > enable this SYSCTL. Any other option seems not OK, will be hard to
> > judge
> > what is lots of memory and what is not, it will depends what is
> > running
> > onto the system.
> > 
> I guess adding it so it can be optionally enabled via a sysctl isn't
> a bad idea. I think the largest risk here is "how do you tell people
> what the risk of enabling this is"?
> 
> There are already a bunch of sysctls related to NFS that few people
> know how to use. (I recall that Alexander has argued that folk don't
> want
> worry about these tunables and I tend to agree.)
> 
> If I do a variant of the patch that uses m_getjcl(..M_WAITOK..), then
> at least the "breakage" is thread(s) sleeping on "btallo", which is
> fairly easy to check for, althouggh rather obscure.
> (Btw, I've never reproduced this for a patch that changes the code to
> always use MJUMPAGESIZE mbuf clusters.
> I can only reproduce it intermittently when the patch mixes
> allocation of
> MCLBYTES clusters and MJUMPAGESIZE clusters.)
> 
> I've been poking at it to try and figure out how to get
> m_getjcl(..M_NOWAIT..)
> to return NULL instead of looping when it runs out of boundary tags
> (to
> see if that can result in a stable implementation of the patch), but
> haven't had much luck yet.
> 
> Bottom line:
> I just don't like committing a patch that can break the system in
> such an
> obscure way, even if it is enabled via a sysctl.
> 
> Others have an opinion on this?
> 
> Thanks, rick
> 
> 
> > The SYSCTL will be great, and in case you don't have time to do it,
> > I
> > can
> > give you a hand.
> > 
> > I'm gonna do more benchmarks today and will send another report,
> > but
> > in our
> > product here, I'm inclined to use this patch, because 10~20% speed
> > up
> > in
> > read for me is a lot. :-)
> > 
> > Thank you so much and best regards,
> > --
> > Marcelo Araujo
> > araujo at FreeBSD.org
> 
> 
> > _______________________________________________
> > freebsd-net at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to
> > " freebsd-net-unsubscribe at freebsd.org "
> > 
> 
> 
> 
> 
> --
> Marcelo Araujo
> araujo at FreeBSD.org