Struggling on NFS problem

Wed May 5 01:27:51 UTC 2010

Hi Rick,
Unfortunately, the machine which shows negative number, it's amd64 not i386:
FreeBSD csie0.cs.ccu.edu.tw 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Jan  5 21:11:58 UTC 2010     root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

I'll try to apply all the patches on my NFS server to see if it can fix the problem. Also, I've replyed a mail which contains more information about NFS server. Could you also take a look on it to see if there's anything wrong?

If the problem shows up again (I'm sure it will), I'll try to capture the NFS traffic between server and clients. Thank you! :)

Regards,
Cheng-Lin Yang

-----Original message-----
From:Rick Macklem <rmacklem at uoguelph.ca>
To:Cheng-Lin Yang <yuwen at exodus.cs.ccu.edu.tw>
Cc:freebsd-fs <freebsd-fs at freebsd.org>,lab <lab at cs.ccu.edu.tw>
Date:Tue, 4 May 2010 10:34:02 -0400 (EDT)
Subject:Re: Struggling on NFS problem

On Tue, 4 May 2010, Cheng-Lin Yang wrote:

> Dear all,
> Currently, we have a NFS server which runs FreeBSD8 with ZFS and few workstations as NFS client (2 * FreeBSD8 amd64 + 1 * FreeBSD7.2 i386 + 2 * Fedora + Debian). We spotted that NFS performs weirdly on FreeBSD clients, which will significantly slow down the system response. The only solution to it is to reboot the clients (Linux client runs smoothly). So we try to use "nfsstat -c" on FreeBSD client to dig into the problem and found strange result (http://pastebin.com/K71qpEDG) :
> csie0[~]# nfsstat -c
[stuff snipped]
>
> As you can see, the value of "BioW Hits" is a negative number. Shouldn't it be equal or larger than zero? We have totally no idea on this issue. Please kindly help us on investigating the problem. Any suggestion is extremely welcomed. Thank you.
>
I suspect that the negative value is just a wrap around (assuming you're
on a 32bit arch) and hust means lottsa hits. If that is the case, it
suggests a fairly heavy write load, which can be an issue for servers
using ZFS (as others have already posted about).

There are a # of patches for FreeBSD8.0 related to NFS (one specifically
w.r.t. the server using ZFS) at:
 	http://people.freebsd.org/~rmacklem

If you are using FreeBSD8.0 for the server, it would be worth trying
these patches (they are all independent, in that any of them can be
applied, in any order). (If you are using a recent stable/8, then
you should already have the patches.)

In particular, one of them fixes a case where FreeBSD clients will
get stuck looping trying to access a file after it has been deleted
on the server, because the server reported EIO instead of ESTALE for
this case.

If the patches don't help, please try to collect more information
from both the slow clients and server. "ps axl" on them all can be
useful. Also, you can use "tcpdump -s 0 -w <file> host <clienthost>"
to capture traffic between the slow client and server which can be
looked at via wireshark. (tcpdump doesn't decode NFS traffic well,
but a binary capture from tcpdump goes into wireshark ok and it does
understand NFS traffic) If you get to this point, you can email me
the "<file>" as an attachment and I can take a look at it. If you
look at it, one scenario that is of interest is where the client
just keeps retrying the same NFS RPC.

Good luck with it and let us know how it goes, rick

----------------------------------
Cheng-Lin Yang
Sun Certified Java Programmer
High Speed Network Group Lab (HSNG)
Institute of Computer Science & Info. Engineering,
National Chung Cheng Univerisity, Taiwan
E-mail: yuwen at exodus.cs.ccu.edu.tw