Suddenly frozen fcntl/stat call on NFS over TCP with MTU 9000

Tue Sep 16 19:49:28 UTC 2008

On Tuesday 16 September 2008 02:02:14 am Tim Chen wrote:
> On Tue, Sep 16, 2008 at 4:06 AM, John Baldwin <jhb at freebsd.org> wrote:
> 
> > On Monday 15 September 2008 11:57:02 am Tim Chen wrote:
> > > Currently I was running a mail server using a netapp filer as backend
> > > storage.
> > > >From time to time, the whole system get stuck and lasted for 3-5
> > minutes.
> > > But
> > > after that, everything recovers normally. During the "stuck" moment,
> > using
> > > ps
> > > auxw shows 200-300 of mail delivery agent(MDA) processes staying in "D"
> > > status.
> > > The command df certainly does not reponse either.
> >
> > Can you use 'ps axl' to determine the wait mesg ("wchan") of the stuck
> > threads
> > when they hang?  If it is "lockf", then make sure you have an up-to-date
> > RELENG_6 kernel as there was a recent fix for a "lockf" hang.
> >
> 
> Thanks for your suggestion. After trying to 'ps axl', it seems all the "D
> status" process were in nfs,nfsreq,nfsreq. Can you give some hint how to
> keep delving the problem?
> 
> My system is RELENG_7 within one week, I always make world to keep my system
> up to date.
> 
> 
> >
> > Alternatively, if things are stuck in "nfsreq", it may be useful to use
> > tcpdump to look at the NFS requests your client is making.  nfsstat can
> > also
> > be useful as you can see which counters are increasing during a hang.
> >
> > When system was stuck, counters of nfsstat grows slowly. It seems only
> read, write, create, remove in RPC counts were increased.
> 
> As to tcpdump, since I am not familiar with that, I will try to read some
> doc and make some tests.
> 
> Thanks very much for your kindly help. Hope the problem can be solved soon.

Also, do the nfsstats thing I suggested.  During a hang, you can do something 
like 'nfsstat > one ; sleep 1 ; nfsstat > two' and compare the 'one' 
and 'two' files to see which counters (if any) are being bumped during the 
hang.

-- 
John Baldwin