Poor throughput using new NFS client (9.0) vs. old (8.2/9.0)

Sat Oct 27 22:03:22 UTC 2012

Thomas Johnson wrote:
> You are exactly correct. I went back to the logs, apparently when we
> tried changing the newnfs wsize/rsize parameters we _changed_ them to
> 64k (derp). Further benchmarking indicates that with newnfs, we see
> the best performance at 16k and 32k; 8k also performs quite well. 9.0
> vs. 9.1 seems very close as well, though it is difficult to draw
> conclusions from a busy production system. Good to know that this is a
> case of PEBKAC, rather than an actual problem. Thanks to everyone for
> the assistance!
> 
Ok, so with "rsize=32768,wsize=327678", the performance is about the
same as "oldnfs"? (If so, I can breath a sigh of relief, since that
would indicate no fundamental problem with "newnfs".)

I'll admit I'm not as convinced as bde@ that 64K won't perform about
as well as 32K for most/many sites. (I see slightly better perf. for
64K than 32K.)

Since you were seeing dramatically poorer (factor of 3) performance
for 64K, I suspect something in your network fabric couldn't handle
the large TCP segments. (At the top of the list of suspects is TSO
support, from my limited experience.)

If you can disable TSO on the network interfaces for the client and
server, it might be worth trying that and seeing if a 64K mount
works well then. (I understand that this might not be practical for
a production system.)

Anyhow, good to hear the problem is resolved for you, rick

> 
> On Tue, Oct 23, 2012 at 6:37 PM, Rick Macklem < rmacklem at uoguelph.ca >
> wrote:
> 
> 
> 
> 
> Thomas Johnson wrote:
> > I built a test image based on 9.1-rc2, per your suggestion Rick. The
> > results are below. I was not able to exactly reproduce the workload
> > in
> > my original message, so I have also included results for the new
> > (very
> > similar) workload on my 9.0 client image as well.
> >
> > To summarize, 9.1-rc2 using newnfs seems to perform better than
> > 9.0-p4, but oldnfs appears to still be significantly faster in both
> > cases.
> >
> > I will get packet traces to Rick, but I want to get new results to
> > the
> > list.
> >
> > -Tom
> >
> > root at test:/test-> uname -a
> > FreeBSD test.claimlynx.com 9.1-RC2 FreeBSD 9.1-RC2 #1: Fri Oct 19
> > 08:27:12 CDT 2012
> > root at builder.claimlynx.com:/usr/obj/usr/src/sys/GENERIC amd64
> >
> >
> > root at test:/-> mount | grep test
> > server:/array/test on /test (nfs)
> > root at test:/test-> zip BIGGER_PILE.zip BIG_PILE_53*
> > adding: BIG_PILE_5306.zip (stored 0%)
> > adding: BIG_PILE_5378.zip (stored 0%)
> > adding: BIG_PILE_5386.zip (stored 0%)
> > root at test:/test-> ll -h BIGGER_PILE.zip
> > -rw-rw-r-- 1 root claimlynx 5.5M Oct 23 14:05 BIGGER_PILE.zip
> > root at test:/test-> time zip BIGGER_PILE.zip 53*.zip > /dev/null
> > 0.664u 1.693s 0:30.21 7.7% 296+3084k 0+2926io 0pf+0w
> > 0.726u 0.989s 0:08.04 21.1% 230+2667k 0+2956io 0pf+0w
> > 0.829u 1.268s 0:11.89 17.4% 304+3037k 0+2961io 0pf+0w
> > 0.807u 0.902s 0:08.02 21.1% 233+2676k 0+2947io 0pf+0w
> > 0.753u 1.354s 0:12.73 16.4% 279+2879k 0+2947io 0pf+0w
> > root at test:/test-> ll -h BIGGER_PILE.zip
> > -rw-rw-r-- 1 root claimlynx 89M Oct 23 14:03 BIGGER_PILE.zip
> >
> Although the runs take much longer (I have no idea why and hopefully
> I can spot something in the packet traces), it shows about half the
> I/O ops. This suggests that it is running at the 64K rsize, wsize
> instead of the 32K used by the old client.
> 
> Just to confirm. Did you run a test using the new nfs client
> with rsize=32768,wsize=32768 mount options, so the I/O size is
> the same as with the old client?
> 
> rick
> 
> 
> 
> >
> > root at test:/test-> mount | grep test
> > server:/array/test on /test (oldnfs)
> > root at test:/test-> time zip BIGGER_PILE.zip 53*.zip > /dev/null
> > 0.645u 1.435s 0:08.05 25.7% 295+3044k 0+5299io 0pf+0w
> > 0.783u 0.993s 0:06.48 27.3% 225+2499k 0+5320io 0pf+0w
> > 0.787u 1.000s 0:06.28 28.3% 246+2884k 0+5317io 0pf+0w
> > 0.707u 1.392s 0:07.94 26.3% 266+2743k 0+5313io 0pf+0w
> > 0.709u 1.056s 0:06.08 28.7% 246+2814k 0+5318io 0pf+0w
> >
> >
> >
> > root at test:/home/tom-> uname -a
> > FreeBSD test.claimlynx.com 9.0-RELEASE-p4 FreeBSD 9.0-RELEASE-p4 #0:
> > Tue Sep 18 11:51:11 CDT 2012
> > root at builder.claimlynx.com:/usr/obj/usr/src/sys/GENERIC amd64
> >
> >
> > root at test:/test-> mount | grep test
> > server:/array/test on /test (nfs)
> > root at test:/test-> time zip BIGGER_PILE.zip 53*.zip > /dev/null
> > 0.721u 1.819s 0:31.13 8.1% 284+2886k 0+2932io 0pf+0w
> > 0.725u 1.386s 0:12.84 16.3% 247+2631k 0+2957io 0pf+0w
> > 0.675u 1.392s 0:13.94 14.7% 300+3005k 0+2928io 0pf+0w
> > 0.705u 1.206s 0:10.72 17.7% 278+2874k 0+2973io 0pf+0w
> > 0.727u 1.200s 0:18.28 10.5% 274+2872k 0+2947io 0pf+0w
> >
> >
> > root at test:/-> umount /test
> > root at test:/-> mount -t oldnfs server:/array/test /test
> > root at test:/-> mount | grep test
> > server:/array/test on /test (oldnfs)
> > root at test:/test-> time zip BIGGER_PILE.zip 53*.zip > /dev/null
> > 0.694u 1.820s 0:10.82 23.1% 271+2964k 0+5320io 0pf+0w
> > 0.726u 1.293s 0:06.37 31.5% 303+2998k 0+5322io 0pf+0w
> > 0.717u 1.248s 0:06.08 32.0% 246+2607k 0+5354io 0pf+0w
> > 0.733u 1.230s 0:06.17 31.7% 256+2536k 0+5311io 0pf+0w
> > 0.549u 1.581s 0:08.02 26.4% 302+3116k 0+5321io 0pf+0w
> >
> >
> > On Thu, Oct 18, 2012 at 5:11 PM, Rick Macklem < rmacklem at uoguelph.ca
> > >
> > wrote:
> >
> >
> >
> >
> > Ronald Klop wrote:
> > > On Thu, 18 Oct 2012 18:16:16 +0200, Thomas Johnson <
> > > tom at claimlynx.com >
> > > wrote:
> > >
> > > > We recently upgraded a number of hosts from FreeBSD 8.2 to 9.0.
> > > > Almost
> > > > immediately, we received reports from users of poor performance.
> > > > The
> > > > upgraded hosts are PXE-booted, with an NFS-mounted root.
> > > > Additionally,
> > > > they
> > > > mount a number of other NFS shares, which is where our users
> > > > work
> > > > from.
> > > > After a week of tweaking rsize/wsize/readahead parameters (per
> > > > guidance),
> > > > it finally occurred to me that 9.0 defaults to the new NFS
> > > > client
> > > > and
> > > > server. I remounted the user shares using the oldnfs file type,
> > > > and
> > > > users
> > > > reported that performance returned to its expected level.
> > > >
> > > > This is obviously a workaround, rather than a solution. We would
> > > > prefer
> > > > to
> > > > get our hosts using the newnfs client, since presumably oldnfs
> > > > will
> > > > be
> > > > deprecated at some point in the future. Is there some change
> > > > that
> > > > we
> > > > should
> > > > have made to our NFS configuration with the upgrade to 9.0, or
> > > > is
> > > > it
> > > > possible that our workload is exposing some deficiency with
> > > > newnfs?
> > > > We
> > > > tend
> > > > to deal with a huge number of tiny files (several KB in size).
> > > > The
> > > > NFS
> > > > server has been running 9.0 for some time (prior to the client
> > > > upgrade)
> > > > without any issue. NFS is served from a zpool, backed by a Dell
> > > > MD3000,
> > > > populated with 15k SAS disks. Clients and server are connected
> > > > with
> > > > Gig-E
> > > > links. The general hardware configuration has not changed in
> > > > nearly
> > > > 3
> > > > years.
> > > >
> > > > As an example of the performance difference, here is some of the
> > > > testing
> > > > I
> > > > did while troubleshooting. Given a directory containing 5671 zip
> > > > files,
> > > > with an average size of 15KB. I append all files to an existing
> > > > zip
> > > > file.
> > > > Using the newnfs mount, I found that this operation generally
> > > > takes
> > > > ~30
> > > > seconds (wall time). Switching the mount to oldnfs resulted in
> > > > the
> > > > same
> > > > operation taking ~10 seconds.
> > > >
> > > > tom at test-1:/test-> ls 53*zip | wc -l
> > > > 5671
> > > > tom at test-1:/test-> ll -h BIG*
> > > > -rw-rw-r-- 1 tom claimlynx 8.9M Oct 17 14:06 BIGGER_PILE_1.zip
> > > > tom at test-1:/test-> time zip BIGGER_PILE_1.zip 53*.zip
> > > > 0.646u 0.826s 0:51.01 2.8% 199+2227k 0+2769io 0pf+0w
> > > > ...reset and repeat...
> > > > 0.501u 0.629s 0:30.49 3.6% 208+2319k 0+2772io 0pf+0w
> > > > ...reset and repeat...
> > > > 0.601u 0.522s 0:32.37 3.4% 220+2406k 0+2771io 0pf+0w
> > > >
> > > > tom at test-1:/-> cd /
> > > > tom at test-1:/-> sudo umount /test
> > > > tom at test-1:/-> sudo mount -t oldnfs -o rw server:/array/test
> > > > /test
> > > > tom at test-1:/-> mount | grep test
> > > > server:/array/test on /test (oldnfs)
> > > > tom at test-1:/-> cd /test
> > > > ...reset and repeat...
> > > > 0.470u 0.903s 0:13.09 10.4% 203+2229k 0+5107io 0pf+0w
> > > > ...reset and repeat...
> > > > 0.547u 0.640s 0:08.65 13.6% 231+2493k 0+5086io 0pf+0w
> > > > tom at test-1:/test-> ll -h BIG*
> > > > -rw-rw-r-- 1 tom claimlynx 92M Oct 17 14:14 BIGGER_PILE_1.zip
> > > >
> > > > Thanks!
> > > >
> > >
> > >
> > > You might find this thread from today interesting.
> > > http://lists.freebsd.org/pipermail/freebsd-fs/2012-October/015441.html
> > >
> > Yes, although I can't explain why Alexey's problem went away
> > when he went from 9.0->9.1 for his NFS server, it would be
> > interesting if Thomas could try the same thing?
> >
> > About the only thing different between the old and new NFS
> > clients is the default rsize/wsize. However, if Thomas tried
> > rsize=32768,wsize=32768 for the default (new) NFS client, then
> > that would be ruled out. To be honest, the new client uses code
> > cloned from the old one for all the caching etc (which is where
> > the clients are "smart"). They use different RPC parsing code,
> > since the new one does NFSv4 as well, but that code is pretty
> > straightforward, so I can't think why it would result in a
> > factor of 3 in performance.
> >
> > If Thomas were to capture a packet trace of the above test
> > for two clients and emailed them to me, I could take a look
> > and see if I can see what is going on. (For Alexey's case,
> > it was a whole bunch of Read RPCs without replies, but that
> > was a Linux client, of course. It also had a significant # of
> > TCP layer retransmits and out of order TCP segments in it.)
> >
> > It would be nice to figure this out, since I was thinking
> > that the old client might go away for 10.0 (can't if these
> > issues still exist).