Reproducable, possibly NFS related, fatal double fault in 6.2-R-p7

Kris Kennaway kris at FreeBSD.org
Tue Oct 30 12:58:46 PDT 2007


Chris H. wrote:
> Quoting Kris Kennaway <kris at freebsd.org>:
> 
>> Clifton Royston wrote:
>>> On Tue, Oct 16, 2007 at 01:01:46PM -0700, Chris H. wrote:
>>>> excerpt from this list titled: NFS == lock && reboot, that I posted 
>>>> follows:
>>>>
>>>> ------8<---SNIP---8<-----SNIP-----8<-------
>>>> # uname -a
>>>> FreeBSD host.domain.tld 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 
>>>> 26 16:27:14 PST 2007
>>>>
>>>> Greetings,
>>>> Does anyone know when NFS and friends will be working again? I 
>>>> haven't been able
>>>> to /safely/ use it from 4.8 on. I remember some talk on the list 
>>>> sometime ago and
>>>> then it seemed to be resolved, as the discussion ended. So I thought 
>>>> it was
>>>> fixed. Seems not. :(
>>>>
>>>> My scenario;
>>>> mount host off root:
>>>> mount script exec'd follows...
>>>>
>>>> #!/bin/sh -
>>>> mount -t nfs host.domain.tld:/ /host
>>>> mount -t nfs host.domain.tld:/var /host/var
>>>>
>>>> confirm mount...
>>>>
>>>> # ls /host
>>>> .snap    COPYRIGHT    bin
>>>> ...
>>>> usr    var    tmp
>>>>
>>>> OK looks good...
>>>>
>>>> # cp /path/to/approx/10Mb/file /host/path/to/dest/dir/
>>>>
>>>> Fatal double fault
>>>> eis 0x0blah
>>>> eiblah blah0x
>>>> panic double fault
>>>> no dump device defined
>>>> rebooting in 15sec...
>>>>
>>>> Hmmm... that's not good. :(
>>>>
>>>> ------8<---SNIP---8<-----SNIP-----8<-------
>>>>
>>>> My final solution was to change the lines in /etc/rc.conf
>>>> from:
>>>> nfs_client_enable="YES"
>>>> nfs_reserved_port_only="YES"
>>>> nfs_server_enable="YES"
>>>> rpc_lockd_enable="YES"
>>>> rpc_statd_enable="YES"
>>>> rpcbind_enable="YES"
>>>>
>>>> to:
>>>> nfs_client_enable="YES"
>>>> nfs_reserved_port_only="YES"
>>>> nfs_server_enable="YES"
>>>> #rpc_lockd_enable="YES"
>>>> #rpc_statd_enable="YES"
>>>> rpcbind_enable="YES"
>>>>
>>>> Making those changes ended the "Fatal double fault && reboot in 15 
>>>> seconds..."
>>>
>>>   Thanks for this very timely mention!  The cluster of servers I am
>>> about to upgrade from 4.8 <embarrassed cough> to 6.2 relies heavily on
>>> NFS to an old Netapp.  If I have got to disable rpc_lockd and
>>> rpc_statd, it's good to know that now!
>>>    Can I ask, can anybody confirm that they're running 6.2 on NFS
>>> successfully *with* lockd and statd?
>>
>> Er, yes, of course it does.  The old message he is quoting is bogus on 
>> its own,
> While I'll grant you that I haven't *yet* found/taken the time to create a
> dump device and re-enable rpd_lockd && rpc_statd && cp 10Mb file to mount
> point to produce an *instantaneous* "Fatal double fault". I don't think 
> it's
> fair to label my original post entirely /bogus/ - especially in light of
> the recent post I replied to. Which seems to have some very common ground.
> I should probably mention that since my last posting (my original thread),
> I have some 20+ RELENG_6_2 boxen that *do* have rpd_lockd + rpc_statd
> enabled. Yet none of them produce a "Fatal double fault". They are all
> Tyan SMP boards with dual onboard fxp's - as opposed to the Nvidia UP
> which has a single onboard nve.   They are all inter-connected via NFS.
> I have a 750Gb drive hanging off the /problematic/ Nvidia board, that I
> had intended to use for NFS back-up's. But given the NFS issue I had with
> it, it didn't seem to be the best solution. If anyone felt like throwing
> me a "cheat sheet" for creating a dump device out of that drive and a
> "quickie" for producing a backtrace. I'm sure I'd be better able to find
> the required time to produce the required information. I'm sorry. It's
> just that I'm a hundred million miles away from that right now. As I've
> been building several large web applications, and their deadline is fast
> approaching. FWIW I bounced all the servers today, and therefore have
> recent /verbose/ dmesg's. Should any of the information they provide, be
> of any help/use to anyone.
> 
> Take care. :)

http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html

It's very unlikely NFS is relevant to the problem (which is what made it 
bogus, together with the lack of debugging) and likely that nve is the 
cause.  The above URL explains in detail how to obtain the necessary 
debugging to confirm this.

Kris



More information about the freebsd-stable mailing list