NFS hangs (7.3)

Oliver Fromme olli at lurza.secnetix.de
Wed Nov 17 17:05:54 UTC 2010


I've got a problem on a server farm.  Every now and then,
some NFS mounts hang.  This happens after a few days or
after a few weeks.  All processes trying to access files
from the hanging mount go to state "D" and freeze.  The
only way to resolve the problem is to reboot the server.

"umount -f" als hangs and does not remove the hanging
mount (even though it disappears from the output of the
mount(8) command).

Here's one example from an attempt to run df(1) which
also hangs:

ps -uww:
USER   PID %CPU %MEM  VSZ  RSS TT  STAT STARTED    TIME COMMAND
root 61930  0.0  0.0 5728 1280 p4- D     5:15PM 0:00.01 /bin/df

ps -lww:
UID   PID PPID CPU PRI NI  VSZ  RSS MWCHAN STAT TT     TIME COMMAND
  0 61930    1   0  -4  0 5728 1280 nfs    D    p4- 0:00.01 /bin/df

procstat -kk:
  PID    TID COMM       TDNAME     KSTACK
61930 100489 df         -          mi_switch+0x18e sleepq_wait+0x3b
_sleep+0x367 acquire+0x7c _lockmgr+0x203 VOP_LOCK1_APV+0x46
_vn_lock+0x83 vget+0xf9 vfs_hash_get+0xf4 nfs_nget+0xa8 nfs_statfs+0x8b
__vfs_statfs+0x2b kern_getfsstat+0x2d6 syscall+0x256 Xfast_syscall+0xab

And this is a hanging umount(8) command (I used fsid syntax,
hoping that it would work better than accessing the mont by
its path, but it doesn't seem to make a difference):

ps -uww:
USER   PID %CPU %MEM  VSZ  RSS TT  STAT STARTED    TIME COMMAND
root 62791  0.0  0.0 4640 1272 p4- D     5:18PM 0:00.08 umount -f a5ff000505000000

ps -lww:
UID   PID PPID CPU PRI NI  VSZ  RSS MWCHAN STAT TT     TIME COMMAND
  0 62791    1   0  -4  0 4640 1272 vfsloc D    p4- 0:00.08 umount -f a5ff000505000000

procstat -kk:
  PID    TID COMM       TDNAME     KSTACK
62791 100239 umount     -          mi_switch+0x18e sleepq_wait+0x3b
_sleep+0x367 _lockmgr+0x4f3 dounmount+0x474 unmount+0x30a
syscall+0x256 Xfast_syscall+0xab

The machine is quite busy.  The hangs seem to always occur
in the night when lots of cron jobs are running.  The machine
has 221 NFS mounts and 26 nullfs mounts, and it has 26 jails,
if that matters.  All NFS shares are mounted from a virtual
filer running on a NetApp filer.  The mounts use the default
settings, so they should be v3 TCP (this is the default,
right?).  The only extra option we use is -L in order to
"fake" locking locally.

The machine is running FreeBSD 7.3-PRERELEASE-20100311 amd64.
Updating is somewhat complicated in that server farm, so I
haven't tried that so far because I'm not sure if it would
help.

Any suggestions or ideas?

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"Software gets slower faster than hardware gets faster."
        -- Niklaus Wirth


More information about the freebsd-fs mailing list