Processes stuck in 'disk wait' state

Mark Morley mark at islandnet.com
Tue Mar 14 03:17:40 UTC 2006


Got what seems to be an NFS issue on two different servers where
processes seem to be deadlocked, stuck in a 'disk wait' state.

The first server was running fine for a couple years with 4.x, but
then a month or so ago I noticed a bunch of cron-spawned processes
stuck in disk wait ('find' commands, for example).  Any command I
tried that accessed the NFS drives (sync, du, ls, etc) would immediately
lock up in the same state.  These processes could not be killed without
rebooting.

Thinking it might be a bad disk, and since we were planning an upgrade
anyway, we built a new server.  New motherboard/CPU, new boot drive,
replaced the data drives with a new RAID system, etc.  The new system is
an AMD/64, the old one was AMD/i386.  The new one is running 6.1  Ran
great for about 3 weeks, then the same thing happened.  It has nothing
in common with the old server as far as hardware goes.

Both servers had the same set of clients, which are all FreeBSD 4.11
using TCP.  These haven't changed, except they are getting busier al the
time.  The NFS traffic is on a dedicated switched gigabit network, nice
and fast.

The problem seems to occur when a file system intensive task is run right
on the NFS server itself during heavy NFS usage.  Things like using 'find'
to delete old files, or using 'du' on a large set of directories, etc.

Does this ring any bells for anyone?  Are there any known issues with the
NFS code that would cause this?  Any known solutions?

Mark

--
Mark Morley
Owner / Administrator
Islandnet.com




More information about the freebsd-fs mailing list