Trouble with NFSd under 6.1-Stable, any ideas?
grafan at gmail.com
Mon May 22 14:43:52 PDT 2006
On 5/14/06, Kris Kennaway <kris at obsecurity.org> wrote:
> On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
> > Hello All,
> > I have been running FBSD a long while, and actually running since the 5.x
> > releases on the server I am having troubles with. I basically have a small
> > network and just use NIS/NFS to link my various FBSD and Solaris machines
> > together.
> > This has all been running fine up till a few days ago, when all of a sudden
> > NFS came to a crawl, and CPU usage so high the box appears to freeze almost.
> > When I had 6.1-RC running all seemed well, then came the announcement for the
> > official 6.1 release, so I did the cvs updates, made world, kernel, and ran
> > mergemaster to get everything up to the 6.1 stable version.
> > Now after doing this, something is wrong with NFS. It works, it will return
> > information and open files, just it's very very slow, and while performing a
> > request the CPU spike is astounding. A simple du of my home directory can
> > take minutes, and machine all but locks up if the request is done over NFS.
> > Here is top snip:
> > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
> > 497 root 1 4 0 1252K 780K - 2 50:42 188.48% nfsd
> > This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM on a
> > disk array, and locally is screams, heck NFS used to scream till I updated. I
> > am not really sure what info would be useful in debugging, so won't post tons
> > of misc junk in this eMail, but if anyone has any ideas as to how best to
> > figure out and resolve this issue it would sure be appreicated...
> Use tcpdump and related tools to find out what traffic is being sent.
> Also verify that you did not change your system configuration in any
> way: there have been no changes to NFS since the release, so it is
> unclear why an update would cause the problem to suddenly occur.
Hi Kris and Howard,
As I posted few days ago, I have similar problems like Howard's
(some details in the thread "6.1-RELEASE, em0 high interrupt rate
and nfsd eats lots of cpu" on stable@). After binary searching
the source tree, I found that
RELENG_6_1, 2006.04.30.03.57 ok
RELENG_6_1, 2006.04.30.04.00 bad
The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
the same problem occurs.
Let me refresh what problems I'm seeing
1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on
a nfs directory
2. on server-side, nfsd starts to eats lots of CPU
3. the du finishes
4. on server-side, nfsd still eats lots of CPU, but there is no
nfs traffic. Wait for 5 minutes, you can still see that nfsd is
"running" and eats lots of CPU.
On FreeBSD 6.1R client, it uses UDP mount and fstab is like
"rw,-L,nosuid,bg,nodev". On Linux cleint, it uses UDP mount and
fstab is like "defaults,udp,hard,intr,nfsvers=3,rsize=8192,wsize=8192".
The server's kernel conf is at
Some related configuration files:
/export/dir1 host1 host2...
/export/dir2 host1 host2...
nfs_server_flags="-u -t -n 16"
mountd_flags="-r -l -n"
/dev/... /export/dir1 ufs rw,nosuid,noexec 2 2
/dev/... /export/dir2 ufs rw,nosuid,noexec,userquota 2 2
The NFS server is also using amd to mount some backup directories
from another NFS server. the amd.conf is
browsable_dirs = yes
map_type = file
mount_type = nfs
auto_dir = /nfs
fully_qualified_hosts = no
log_file = syslog
nfs_proto = udp
nfs_allow_insecure_port = no
nfs_vers = 3
# plock = yes
selectors_on_default = yes
restart_mounts = yes
map_options = type:=direct
map_name = /etc/amd.direct
If there are any thing I can provide to help tracking this down. Please
let me know. By the way, I tried with truss/kdump to see what happens
when nfsd eats lot of CPUs, but in vain. They do not return anything.
More information about the freebsd-stable