RFC: How to fix the NFS/iSCSI vs TSO problem

Tue Apr 1 01:53:52 UTC 2014

2014-04-01 9:43 GMT+08:00 Rick Macklem <rmacklem at uoguelph.ca>:

> Marcelo Araujo wrote:
> >
> > Hello Rick,
> >
> >
> > We have made couple more benchmarks here with additional options,
> > such like '64 threads' and readahead=8.
> >
> I can't remember, but if you haven't already done so, another thing to
> try are these sysctls on the server:
> sysctl vfs.nfsd.tcphighwater=100000
> sysctl vfs.nfsd.tcpcachetimeo=300
>
> These should reduce the server's CPU overhead (how important these
> setting are depends on how current your kernel is).
>

I haven't done it, I don't have these sysctl on my system.

>
> >
> > Now, we add nfsstat and netstat -m into the table.
> > Here attached is the full benchmark, and I can say, this patch really
> > improved the read speed.
> >
> I noticed a significant reduction in CPU usage on the server (about 20%).
> An interesting question would be "Is this CPU reduction a result of
> avoiding the m_defrag() calls in the ix driver?".
>

I do believe it is because avoid m_defrag(), but I didn't try to dig into
it to check if is really m_defrag().

> Unfortunately, the only way I can think of answering this is doing the
> benchmarks on hardware without the 32 mbuf chain limitation, but I
> doubt that you can do that?
>

No, I don't have any hardware without the 32mbuf limitation.

>
> Put another way, it would be interesting to compare "with vs without"
> the patch on machines where the network interface can handle 35 mbufs
> in the transmit chain, so there aren't m_defrag() calls being done for
> the non-patched case.
>
> Anyhow, have fun with it, rick
>

Maybe Christopher can do this benchmark as well in his environment.

>
> >
> > I understand your concern about add more one sysctl, however maybe we
> > can do something like ZFS does, if it detect the system is AMD and
> > have more than X of RAM it enables some options by default, or a
> > kind of warning can be displayed show the new sysctl option.
> >
> >
> > Of, course other people opinion will be very welcome.
> >
> >
> > Best Regards,
> >
> >
> >
> > 2014-03-29 6:44 GMT+08:00 Rick Macklem < rmacklem at uoguelph.ca > :
> >
> >
> >
> >
> > Marcelo Araujo wrote:
> > > 2014-03-28 5:37 GMT+08:00 Rick Macklem < rmacklem at uoguelph.ca >:
> > >
> > > > Christopher Forgeron wrote:
> > > > > I'm quite sure the problem is on 9.2-RELEASE, not 9.1-RELEASE
> > > > > or
> > > > > earlier,
> > > > > as a 9.2-STABLE from last year I have doesn't exhibit the
> > > > > problem.
> > > > > New
> > > > > code in if.c at line 660 looks to be what is starting this,
> > > > > which
> > > > > makes me
> > > > > wonder how TSO was being handled before 9.2.
> > > > >
> > > > > I also like Rick's NFS patch for cluster size. I notice an
> > > > > improvement, but
> > > > > don't have solid numbers yet. I'm still stress testing it as we
> > > > > speak.
> > > > >
> > > > Unfortunately, this causes problems for small i386 systems, so I
> > > > am reluctant to commit it to head. Maybe a variant that is only
> > > > enabled for amd64 systems with lots of memory would be ok?
> > > >
> > > >
> > > Rick,
> > >
> > > Maybe you can create a SYSCTL to enable/disable it by the end user
> > > will be
> > > more reasonable. Also, of course, it is so far safe if only 64Bits
> > > CPU can
> > > enable this SYSCTL. Any other option seems not OK, will be hard to
> > > judge
> > > what is lots of memory and what is not, it will depends what is
> > > running
> > > onto the system.
> > >
> > I guess adding it so it can be optionally enabled via a sysctl isn't
> > a bad idea. I think the largest risk here is "how do you tell people
> > what the risk of enabling this is"?
> >
> > There are already a bunch of sysctls related to NFS that few people
> > know how to use. (I recall that Alexander has argued that folk don't
> > want
> > worry about these tunables and I tend to agree.)
> >
> > If I do a variant of the patch that uses m_getjcl(..M_WAITOK..), then
> > at least the "breakage" is thread(s) sleeping on "btallo", which is
> > fairly easy to check for, althouggh rather obscure.
> > (Btw, I've never reproduced this for a patch that changes the code to
> > always use MJUMPAGESIZE mbuf clusters.
> > I can only reproduce it intermittently when the patch mixes
> > allocation of
> > MCLBYTES clusters and MJUMPAGESIZE clusters.)
> >
> > I've been poking at it to try and figure out how to get
> > m_getjcl(..M_NOWAIT..)
> > to return NULL instead of looping when it runs out of boundary tags
> > (to
> > see if that can result in a stable implementation of the patch), but
> > haven't had much luck yet.
> >
> > Bottom line:
> > I just don't like committing a patch that can break the system in
> > such an
> > obscure way, even if it is enabled via a sysctl.
> >
> > Others have an opinion on this?
> >
> > Thanks, rick
> >
> >
> > > The SYSCTL will be great, and in case you don't have time to do it,
> > > I
> > > can
> > > give you a hand.
> > >
> > > I'm gonna do more benchmarks today and will send another report,
> > > but
> > > in our
> > > product here, I'm inclined to use this patch, because 10~20% speed
> > > up
> > > in
> > > read for me is a lot. :-)
> > >
> > > Thank you so much and best regards,
> > > --
> > > Marcelo Araujo
> > > araujo at FreeBSD.org
> >
> >
> > > _______________________________________________
> > > freebsd-net at freebsd.org mailing list
> > > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > > To unsubscribe, send any mail to
> > > " freebsd-net-unsubscribe at freebsd.org "
> > >
> >
> >
> >
> >
> > --
> > Marcelo Araujo
> > araujo at FreeBSD.org
>

-- 
Marcelo Araujo
araujo at FreeBSD.org