Suddenly frozen fcntl/stat call on NFS over TCP with MTU 9000

Tue Sep 16 06:02:15 UTC 2008

On Tue, Sep 16, 2008 at 4:06 AM, John Baldwin <jhb at freebsd.org> wrote:

> On Monday 15 September 2008 11:57:02 am Tim Chen wrote:
> > Currently I was running a mail server using a netapp filer as backend
> > storage.
> > >From time to time, the whole system get stuck and lasted for 3-5
> minutes.
> > But
> > after that, everything recovers normally. During the "stuck" moment,
> using
> > ps
> > auxw shows 200-300 of mail delivery agent(MDA) processes staying in "D"
> > status.
> > The command df certainly does not reponse either.
>
> Can you use 'ps axl' to determine the wait mesg ("wchan") of the stuck
> threads
> when they hang?  If it is "lockf", then make sure you have an up-to-date
> RELENG_6 kernel as there was a recent fix for a "lockf" hang.
>

Thanks for your suggestion. After trying to 'ps axl', it seems all the "D
status" process were in nfs,nfsreq,nfsreq. Can you give some hint how to
keep delving the problem?

My system is RELENG_7 within one week, I always make world to keep my system
up to date.

>
> Alternatively, if things are stuck in "nfsreq", it may be useful to use
> tcpdump to look at the NFS requests your client is making.  nfsstat can
> also
> be useful as you can see which counters are increasing during a hang.
>
> When system was stuck, counters of nfsstat grows slowly. It seems only
read, write, create, remove in RPC counts were increased.

As to tcpdump, since I am not familiar with that, I will try to read some
doc and make some tests.

Thanks very much for your kindly help. Hope the problem can be solved soon.

Sincerely,
Tim Chen