mountd has resolving problems

Thu Feb 17 19:46:32 UTC 2011

On Thu, 17 Feb 2011, John Baldwin wrote:

> On Thursday, February 17, 2011 7:18:28 am Steven Hartland wrote:
>> This has become a issue for us in 8.x as well.
>>
>> I'm pretty sure in pre 8.x these nfs mounts would simply background but
>> recently machines are now failing to boot. It seems that failure to
>> lookup nfs mount point hosts now causes this fatal error :(
>>
>> We've just tried Jeremy's netwait script and it works perfectly so either
>> this or something similar needs to get pushed into base.
>>
>> For reference the reason we need a delay here is our core Cisco router
>> takes a while to bring the port up properly on boot.
>>
>> Thanks for sharing the script Jeremy :)
>
> I use a similar hack that waits up to 30 seconds for the default gateway to be
> pingable.  I think it is at least partly related to the new ARP code that now
> drops packets in IP output if the link is down.

I use hackish ping -t <timeout much smaller than 30 seconds since even 2
seconds is annoying>s and traceroutes in /etc/rc.d/netif.  Don't know if
it is the same problem.  It affects mainly nfs and ntpdate/ntpd to local
systems here.  Even with all-static routes.

> This can be very problematic
> during boot since some interfaces take a few seconds to negotiate link but
> the end result of the new check in IP output is that the attempt to send the
> packet fails with an error causing gethostbyname() and getaddrinfo() to fail
> completely without doing any retries.  In 7 the packet would either sit in the

Also after down/up to change something.  If you try to use the network
before it is back then you have to wait much longer before it is really
back.  This is a relatively minor problem since down/up is not needed
routinely.

> descriptor ring until link was up, or it would be dropped, but it would
> silently fail, so the resolver in libc would just retry in 30 seconds or so at
> which time it would work fine.
>
> Waiting for the default route to be pingable actually fixed a few other
> problems for us on 7 though as well (often ntpdate would not work on boot and
> now it works reliably, etc.) so we went with that route.

I thought I first saw the problem a little earlier, and it affected bge more
than fxp.  Maybe the latter is correct and the problem is smaller with fxp
just because it is ready sooner.

Bruce