NFS on ZFS pure SSD pool

Thu Aug 29 13:49:39 UTC 2013

Sam Fourman Jr wrote:
> 
> 
> 
> 
> 
> 
> On Wed, Aug 28, 2013 at 2:27 PM, Eric Browning <
> ericbrowning at skaggscatholiccenter.org > wrote:
> 
> 
> Rick,
> 
> Sam and I applied the patch (kernel now at r254983M) and set
> vfs.nfsd.tcphighwater=5000
> in sysctl.conf and my CPU is still slammed. SHould I up it to 10000?
> 
> 
> 
> 
> 
> 
> Hello, list
> I am helping Eric debug and test this situation as much as I can.
> 
> 
> So to clarify and recap, here is the situation:
> 
> 
> 
> This is a production setting, in a school, that has 200+ students
> using a mix of systems,with the primary client being OSX 10.8.
> and the primary function is using NFS.
> 
I haven't touched a Mac in several years, but I think Finder probes at
regular intervals to see if directories (oops, I meant folders;-) have
changed. I think there is a way to increase the interval time between
probes.
Also, I think there are tunables for the metadata cache in ZFS, which
might be useful for increasing the metadata cache sizes, since the probes
will be checking metadata (attributes).

If this is contributing the the heavy load, I'd suspect "nfsstat -e -s"
to show a large count for Getattr. (I vaguely remember that NFSv4 mounts
were in use. The counts of everything is larger for NFSv4, since they
are counts of operations and not RPCs. Each NFSv4 RPC is a compound made
up of N operations.)

Just something that might be worth looking at, rick

> 
> from what I can see there should be plenty of disk I/O
> 
> these are Intel SSD disks..
> 
> 
> The server is running FreeBSD 9-STABLE r254983 (we patched it last
> night)
> 
> with this patch
> http://people.freebsd.org/~rmacklem/drc4-stable9.patch
> 
> 
> 
> Here is a full dmesg for reference (it states FreeBSD 9.1,but we have
> since
> upgraded and applied the above patch)
> 
> 
> https://gist.github.com/sfourman/6373059
> 
> 
> 
> 
> 
> 
> 
> 
> 
> The main problem is we need better performance from NFS, but
> it would appear the server is starved for CPU cycles....
> 
> 
> With only a few clients the server is lightning fast but
> with 25 users logging in this morning (students in class) the server
> went right to 1200% CPU load
> and about 3 00% more going to "intr" and it pretty much stayed there
> all day until they logged out between classes.
> 
> 
> So that works out to be somewhere between 2 to 4 users per core
> 
I'm not the guy to be able to help with how to do it, put profiling the
running kernel to try and see where the CPU is being used, could help.
(At this point, I suspect it isn't in the nfs code, since the DRC seems
 to be the only CPU hog and I think the patch you are already using fixes
 that.)

Good luck with it, rick

> 
> 
> during today's classes, different settings for vfs.nfsd.tcphighwater
> were tested
> various ranges from 5,000 up to 50,000 were used while a load was
> present, but
> the processor load didn't change.
> 
> 
> Garrett stated that he tried values in upwards of 100,000... this can
> be tested tomorrow
> 
> 
> 
> 
> It would be helpful if we could get some direction, on other things
> we might try tomorrow.
> 
> 
> one idea is, the server has several igb Ethernet interfaces with 8
> queue's per interface
> is it worth forcing the interfaces down to one queue?
> 
> 
> Is NFS even setup to understand multi queue network devices? or
> doesn't it matter?
> 
> 
> Any thoughts are appreciated --
> 
> Sam Fourman Jr.
>