NFS == lock && reboot

Chris H. chris# at 1command.com
Wed Apr 4 14:39:51 UTC 2007


Quoting Oliver Fromme <olli at lurza.secnetix.de>:

> Chris H. <chris#@1command.com> wrote:
> > Thomas David Rivers wrote:
> > > I have found that if I kill rpc.lockd on the NFS server,
> > > most of the NFS issues I have (including a similar lock-up on
> > > 6.1-RELEASE) go away.
>
> FWIW, I also had problems with running rpc.lockd and
> rpc.statd (no panics, though).  If you don't need them
> (i.e. you don't need cross-machine locking), then don't
> use them.  Use the -L flag to mount_nfs so at least
> local locking works.
>
> > You don't happen to have any experiences keeping rpc.statd
> > running?
>
> Basically, it doesn't make much sense to run one without
> the other.  If you disable rpc.lockd, you can also safely
> disable rpc.statd.
>
> However, I don't think that your actual problem (lock-up
> and panics) is related to rpc.lockd or rpc.statd.  It
> rather sounds like something else is wrong with your
> machine.  NFS works perfectly fine for me, including
> copying huge files.
>
> You wrote that you had a lot of crashes that accumulated
> many files in lost+found.  Well, maybe your filesystem
> was somehow damaged in the process.  It is possible to
> damage file systems in a way that can lead to panics, and
> it's not necessarily detected and repaired by fsck.

Indeed. I /too/ considered this. However, I largely dismissed this
as a possibility as most all of them are 0 length in size. The others
are fragments of logs. I'm not /completely/ ruling this out though.

>
> > > > # cp /path/to/approx/10Mb/file /host/path/to/dest/dir/
> > > >
> > > > Fatal double fault
> > > > eis 0x0blah
> > > > eiblah blah0x
> > > > panic double fault
> > > > no dump device defined
>
> You should try to setup a dump device, so you get a kernel
> crash dump next time.  The crash dump can be used to find
> out where the crash occured -- and I bet it's not in the
> NFS code.
>
> See the Handbook for details on how to setup a dump device.
>
> By the way, does the problem also occur when copying the
> file to/from a memory disk, so no physical disk is involved?
> That way you would exclude the disk and the disk driver as
> potential causes.  Similarly, try a loopback NFS mount
> (i.e. mount from 127.0.0.1) in order to exclude the network
> interface driver as a potential cause.
>
> If the problem still exists when copying a 10 MB file from
> a memory disk to a memory disk (same or other) via a
> localhost mount on the same machine, then it looks like
> the NFS code might be at fault.
>
> Best regards
>   Oliver

All good advise. I'm going to /initially/ take the easy way out
first (remove lockd/statd from rc.conf). As a quick experiment.
Then I'll endevour to investigate further using your suggestions.

Thank you very much for all your time and thoughtful answer.

--Chris


>
> --
> Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
> Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
> secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
> chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart
>
> FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd
>
> "C++ is the only current language making COBOL look good."
>        -- Bertrand Meyer
>



-- 
panic: kernel trap (ignored)



-----------------------------------------------------------------
FreeBSD 5.4-RELEASE-p12 (SMP - 900x2) Tue Mar 7 19:37:23 PST 2006
/////////////////////////////////////////////////////////////////



More information about the freebsd-stable mailing list