mountd has resolving problems

Thu Feb 17 17:00:10 UTC 2011

On Thursday, February 17, 2011 7:18:28 am Steven Hartland wrote:
> This has become a issue for us in 8.x as well.
> 
> I'm pretty sure in pre 8.x these nfs mounts would simply background but
> recently machines are now failing to boot. It seems that failure to
> lookup nfs mount point hosts now causes this fatal error :(
> 
> We've just tried Jeremy's netwait script and it works perfectly so either
> this or something similar needs to get pushed into base.
> 
> For reference the reason we need a delay here is our core Cisco router
> takes a while to bring the port up properly on boot.
> 
> Thanks for sharing the script Jeremy :)

I use a similar hack that waits up to 30 seconds for the default gateway to be 
pingable.  I think it is at least partly related to the new ARP code that now 
drops packets in IP output if the link is down.  This can be very problematic
during boot since some interfaces take a few seconds to negotiate link but
the end result of the new check in IP output is that the attempt to send the
packet fails with an error causing gethostbyname() and getaddrinfo() to fail 
completely without doing any retries.  In 7 the packet would either sit in the 
descriptor ring until link was up, or it would be dropped, but it would 
silently fail, so the resolver in libc would just retry in 30 seconds or so at 
which time it would work fine.

Waiting for the default route to be pingable actually fixed a few other 
problems for us on 7 though as well (often ntpdate would not work on boot and 
now it works reliably, etc.) so we went with that route.

>     Regards
>     Steve
> 
> ----- Original Message ----- 
> From: "Jeremy Chadwick" <freebsd at jdc.parodius.com>
> To: "Olaf Seibert" <O.Seibert at cs.ru.nl>
> Cc: <freebsd-stable at freebsd.org>
> Sent: Thursday, September 09, 2010 2:05 PM
> Subject: Re: mountd has resolving problems
> 
> 
> > On Thu, Sep 09, 2010 at 03:10:17PM +0200, Olaf Seibert wrote:
> >> I just upgraded a box from 8.0 to 8.1, and already when rebooting with
> >> the new kernel (i.e. before installing new userland), I got the
> >> following problem.
> >> 
> >> Of course many of the messages scrolled off screen, but some were
> >> preserved in the syslog.
> >> 
> >> Sep  9 14:26:51 fourquid mountd[839]: can't get address info for host XYZ
> >> Sep  9 14:26:51 fourquid mountd[839]: bad host XYZ in netgroup vbgroup, 
skipping
> >> 
> >> Mountd was run and wanted to determine which hosts to export to.
> >> However, it could not resolve any of them. So, that suggests some
> >> network issue.
> >> 
> >> However, I use a static IP address (no DHCP) and static info in
> >> /etc/resolv.conf, using one of the university's name servers. So
> >> resolving should always be available.
> >> 
> >> Running /etc/rc.d/mountd restart so far always solved the export
> >> problem.
> >> 
> >> I have also seen (presumably similar) issues with mounting NFS file
> >> systems, but that was deemed so fatal that the boot was aborted. A mount
> >> ``by hand'' of the affected file system also worked.
> >> 
> >> Any ideas? Maybe with the new kernel the network interface is a bit
> >> slower in coming up, and not fully working by the time /etc/rc.d/mountd
> >> runs? In fact, I now notice this sequence of messages in
> >> /var/log/messages:
> >> 
> >> Sep  9 14:26:51 fourquid mountd[839]: bad host XYZ in netgroup vbgroup, 
skipping
> >> Sep  9 14:26:51 fourquid mountd[839]: bad exports list line /xxxxxx
> >> Sep  9 14:26:54 fourquid kernel: fuse4bsd: version 0.3.9-pre1, FUSE ABI 
7.8
> >> Sep  9 14:26:54 fourquid init: /bin/sh on /etc/rc terminated abnormally, 
going to single user mode
> >> Sep  9 14:26:55 fourquid kernel: nfe0: link state changed to UP
> >> 
> >> so here the network interface takes a full 4 more seconds to come up,
> >> after it was already needed.
> >> 
> >> I can try to put a 10 sec delay somewhere, but there should be a better
> >> solution...
> > 
> > The problem is that the network isn't "truly" up and available by the
> > time mountd runs, and therefore DNS resolution doesn't work.  Please use
> > my netwait script to solve this problem:
> > 
> > http://jdc.parodius.com/freebsd/netwait
> > 
> > Place it in /usr/local/etc/rc.d, make sure it's chmod'd to 755,
> > then enable use of it by using /etc/rc.conf variables like so:
> > 
> > netwait_enable="yes"
> > netwait_ip="4.2.2.1 4.2.2.2"
> > netwait_if="nfe0"
> > 
> > For what the variables do, please see the script comments.
> > 
> > -- 
> > | Jeremy Chadwick                                   jdc at parodius.com |
> > | Parodius Networking                       http://www.parodius.com/ |
> > | UNIX Systems Administrator                  Mountain View, CA, USA |
> > | Making life hard for others since 1977.              PGP: 4BD6C0CB |
> > 
> > _______________________________________________
> > freebsd-stable at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
> >
> 
> ================================================
> This e.mail is private and confidential between Multiplay (UK) Ltd. and the 
person or entity to whom it is addressed. In the event of misdirection, the 
recipient is prohibited from using, copying, printing or otherwise 
disseminating it or any information contained in it. 
> 
> In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
> or return the E.mail to postmaster at multiplay.co.uk.
> 
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
> 

-- 
John Baldwin