possible NFS lockups

Wed Jul 28 07:18:44 UTC 2010

I have a few remarks and questions; what happens when the system is in this state? Your access to the mount fails but is restored after a while, or do you need to remount, under normal conditions the access should be restored automaticlly. The error message per se is indicating a busy server and should clear up after a while, as you have seen.
How frequent do you see the error, once per hour, day? If you say filer, I assume you are talking about a Netapp filer, it might be worth taking a perfstat when the error happens, and when the condition exists. I think dtrace will not really help since this seems a server issue to me.
As the filer is used to store mails, I assume we are talking about qmail  or similiar environment with a huge number of small files, I would like to know how the directory structure looks on the filer. 
If possible get a perfstat and provide the directory structure offline to me and I will have a look.

-Andreas 
-------- Original-Nachricht --------
> Datum: Tue, 27 Jul 2010 20:55:42 +0100
> Von: krad <kraduk at googlemail.com>
> An: freebsd-hackers at freebsd.org, FreeBSD Questions <freebsd-questions at freebsd.org>
> Betreff: Re: possible NFS lockups

> On 27 July 2010 16:29, krad <kraduk at googlemail.com> wrote:
> 
> > I have a production mail system with an nfs backend. Every now and again
> we
> > see the nfs die on a particular head end. However it doesn't die across
> all
> > the nodes. This suggests to me there isnt an issue with the filer itself
> and
> > the stats from the filer concur with that.
> >
> > The symptoms are lines like this appearing in dmesg
> >
> > nfs server 10.44.17.138:/vol/vol1/mail: not responding
> > nfs server 10.44.17.138:/vol/vol1/mail: is alive again
> >
> > trussing df it seems to hang on getfsstat, this is presumably when it
> tries
> > the nfs mounts
> >
> > eg
> >
> > __sysctl(0xbfbfe224,0x2,0xbfbfe22c,0xbfbfe230,0x0,0x0) = 0 (0x0)
> > mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =
> > 1746583552 (0x681ac000)
> > mmap(0x682ac000,344064,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0)
> =
> > 1747632128 (0x682ac000)
> > munmap(0x681ac000,344064)                        = 0 (0x0)
> > getfsstat(0x68201000,0x1270,0x2,0xbfbfe960,0xbfbfe95c,0x1) = 9 (0x9)
> >
> >
> > I have played with mount options a fair bit but they dont make much
> > difference. This is what they are set to at present
> >
> > 10.44.17.138:/vol/vol1/mail     /mail/0 nfs
> > rw,noatime,tcp,acdirmax=320,acdirmin=180,acregmax=320,acregmin=180 0    
>   0
> >
> > When this locking is occuring I find that if I do a show mount or mount
> > 10.44.17.138:/vol/vol1/mail again under another mount point I can access
> > it fine.
> >
> > One thing I have just noticed is that lockd and statd always seem to
> have
> > died when this happens. Restarting does not help
> >
> >
> > I find all this a bit perplexing. Can anyone offer any help into why
> this
> > might be happening. I have dtrace compliled into the kernel if that
> could
> > help with debugging
> >
> 
> sorry i missed a bit of critical info
> 
> # uname -a
> FreeBSD X 8.1-STABLE FreeBSD 8.1-STABLE #2: Mon Jul 26 16:10:19 BST 2010
> root at mk-pimap-7.b2b.uk.tiscali.com:/usr/obj/usr/src/sys/DTRACE  i
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"

-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01