RFC: How to fix the NFS/iSCSI vs TSO problem

Rick Macklem rmacklem at uoguelph.ca
Thu Mar 27 21:37:44 UTC 2014


Christopher Forgeron wrote:
> I'm quite sure the problem is on 9.2-RELEASE, not 9.1-RELEASE or
> earlier,
> as a 9.2-STABLE from last year I have doesn't exhibit the problem.
>  New
> code in if.c at line 660 looks to be what is starting this, which
> makes me
> wonder how TSO was being handled before 9.2.
> 
> I also like Rick's NFS patch for cluster size. I notice an
> improvement, but
> don't have solid numbers yet. I'm still stress testing it as we
> speak.
> 
Unfortunately, this causes problems for small i386 systems, so I
am reluctant to commit it to head. Maybe a variant that is only
enabled for amd64 systems with lots of memory would be ok?

> 
> On Wed, Mar 26, 2014 at 11:44 PM, Marcelo Araujo
> <araujobsdport at gmail.com>wrote:
> 
> > Hello All,
> >
> >
> > 2014-03-27 8:27 GMT+08:00 Rick Macklem <rmacklem at uoguelph.ca>:
> > >
> > > Well, bumping it from 32->35 is all it would take for NFS (can't
> > > comment
> > > w.r.t. iSCSI). ixgbe uses 100 for the 82598 chip and 32 for the
> > > 82599
> > > (just so others aren't confused by the above comment). I
> > > understand
> > > your point was w.r.t. using 100 without blowing the kernel stack,
> > > but
> > > since the testers have been using "ix" with the 82599 chip which
> > > is
> > > limited to 32 transmit segments...
> > >
> > > However, please increase any you know can be safely done from
> > > 32->35,
> > rick
> > >
> > >
> > I have plenty of machines using Intel X540 that is based on 82599
> > chipset.
> > I have applied Rick's patch on ixgbe to check if the packet size is
> > bigger
> > than 65535 or cluster is bigger than 32. So far till now, on
> > FreeBSD
> > 9.1-RELEASE this problem does not happens.
> >
> > Unfortunately all my environment here is based on 9.1-RELEASE, with
> > some
> > merges from 10-RELEASE such like: NFS and IXGBE.
> >
I can't see why it couldn't happen on 9.1 or earlier, since it just
uses IP_MAXPACKET in tcp_output().

However, to make it happen NFS has to do a read reply (server) or
write request (client) that is a little under 64K bytes. Normally
the default will be a full 64K bytes, so for the server side it
would take a read of a file where the EOF is just shy of the 64K
boundary. For the client write it would be a write of a partially
dirtied block where most of the block has been dirtied. (Software
builds generate a lot of patially dirtied blocks, but I don't know
what else would. For sequential writing it would be a file that
ends just shy of a 64K boundary (similar to the server side) being
written.

I think it is more likely your NFS file activity and not 9.1 vs 9.2
that avoids the problem. (I suspect there are quite a few folk running
NFS 9.2 or later on these ix chips who don't see the problem as well.)
Fortunately you (Christopher) were able to reproduce it, so the problem
could be isolated.

Thanks everyone for your help with this, rick

> > Also I have applied the patch that Rick sent in another email with
> > the
> > subject 'NFS patch to use pagesize mbuf clusters'. And we can see
> > some
> > performance boost over 10Gbps Intel. However here at the company,
> > we are
> > still doing benchmarks. If someone wants to have my benchmark
> > result, I can
> > send it later.
> >
> > I'm wondering, if this update on ixgbe from 32->35 could be applied
> > also
> > for versions < 9.2. I'm thinking, that this problem arise only on
> > 9-STABLE
> > and consequently on 9.2-RELEASE. And fortunately or not 9.1-RELEASE
> > doesn't
> > share it.
> >
> > Best Regards,
> > --
> > Marcelo Araujo
> > araujo at FreeBSD.org
> > _______________________________________________
> > freebsd-net at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to
> > "freebsd-net-unsubscribe at freebsd.org"
> >
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe at freebsd.org"
> 


More information about the freebsd-fs mailing list