Trouble with NFSd under 6.1-Stable, any ideas?

Rong-en Fan grafan at gmail.com
Mon May 22 14:43:52 PDT 2006


On 5/14/06, Kris Kennaway <kris at obsecurity.org> wrote:
> On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
> >
> >    Hello All,
> >
> >  I have been running FBSD a long while, and actually running since the 5.x
> > releases on the server I am having troubles with.   I basically have a small
> > network and just use NIS/NFS to link my various FBSD and Solaris machines
> > together.
> >
> >  This has all been running fine up till a few days ago, when all of a sudden
> > NFS came to a crawl, and CPU usage so high the box appears to freeze almost.
> > When I had 6.1-RC running all seemed well, then came the announcement for the
> > official 6.1 release, so I did the cvs updates, made world, kernel, and ran
> > mergemaster to get everything up to the 6.1 stable version.
> >
> >  Now after doing this, something is wrong with NFS.   It works, it will return
> > information and open files, just it's very very slow, and while performing a
> > request the CPU spike is astounding.  A simple du of my home directory can
> > take minutes, and machine all but locks up if the request is done over NFS.
> > Here is top snip:
> >
> >   PID USERNAME   THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
> >   497 root         1   4    0  1252K   780K -      2  50:42 188.48% nfsd
> >
> >
> >  This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM on a
> > disk array, and locally is screams, heck NFS used to scream till I updated.  I
> > am not really sure what info would be useful in debugging, so won't post tons
> > of misc junk in this eMail, but if anyone has any ideas as to how best to
> > figure out and resolve this issue it would sure be appreicated...
>
> Use tcpdump and related tools to find out what traffic is being sent.
>
> Also verify that you did not change your system configuration in any
> way: there have been no changes to NFS since the release, so it is
> unclear why an update would cause the problem to suddenly occur.
>
> Kris

Hi Kris and Howard,

As I posted few days ago, I have similar problems like Howard's
(some details in the thread "6.1-RELEASE, em0 high interrupt rate
and nfsd eats lots of cpu" on stable@). After binary searching
the source tree, I found that

RELENG_6_1, 2006.04.30.03.57 ok
RELENG_6_1, 2006.04.30.04.00 bad

The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
the same problem occurs.

Let me refresh what problems I'm seeing

1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on
   a nfs directory
2. on server-side, nfsd starts to eats lots of CPU
3. the du finishes
4. on server-side, nfsd still eats lots of CPU, but there is no
   nfs traffic. Wait for 5 minutes, you can still see that nfsd is
   "running" and eats lots of CPU.

On FreeBSD 6.1R client, it uses UDP mount and fstab is like
"rw,-L,nosuid,bg,nodev". On Linux cleint, it uses UDP mount and
fstab is like "defaults,udp,hard,intr,nfsvers=3,rsize=8192,wsize=8192".
The server's kernel conf is at

http://www.rafan.org/FreeBSD/nfs/KERNEL

Some related configuration files:

/etc/export
  /export/dir1 host1 host2...
  /export/dir2 host1 host2...

/etc/rc.conf
nfs_server_enable="YES"
nfs_server_flags="-u -t -n 16"
mountd_enable="YES"
mountd_flags="-r -l -n"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
rpcbind_enable="YES"

/etc/fstab:
/dev/...  /export/dir1 ufs rw,nosuid,noexec 2 2
/dev/...  /export/dir2 ufs rw,nosuid,noexec,userquota 2 2

The NFS server is also using amd to mount some backup directories
from another NFS server. the amd.conf is

[global]
browsable_dirs = yes
map_type = file
mount_type = nfs
auto_dir = /nfs
fully_qualified_hosts = no
log_file = syslog
nfs_proto = udp
nfs_allow_insecure_port = no
nfs_vers = 3
# plock = yes
selectors_on_default = yes
restart_mounts = yes

[/backup]
map_options = type:=direct
map_name = /etc/amd.direct

/etc/amd.direct:
/defaults
opts:=rw,grpid,resvport,vers=3,proto=udp,nosuid,nodev,rsize=8192,wsize=8192
backup          type:=nfs;rhost:=nfs2;rfs:=/nfs2/${host}


If there are any thing I can provide to help tracking this down. Please
let me know. By the way, I tried with truss/kdump to see what happens
when nfsd eats lot of CPUs, but in vain. They do not return anything.

Regards,
Rong-En Fan


More information about the freebsd-stable mailing list