Re: RFC: mount_nfs failure due to dns not running yet

From: Daniel Mayfield <dan_at_3geeks.org>
Date: Thu, 20 Feb 2025 00:29:42 UTC
On 2/19/25 5:40 PM, Rick Macklem wrote:
> Hi,
>
> The subject line basically describes the problem glebius@
> ran into.  When doing an NFS mount in /etc/fstab, it failed
> since the DNS service was not yet working and, as such,
> the DNS lookup of the server fqdn failed, causing the mount
> to fail. Note that this behaviour has existed for decades.
>
> He feels this is a bug and that mount_nfs(8) should retry
> getaddrinfo(3) calls until success, instead of failing the
> mount when the first attempt fails.
> The problem with just retrying getaddrinfo(3) is that it
> could retry forever for simple failures like a typo in the
> server fqdn.
> I can see several ways this can be handled and would
> like feedback from others w.r.t. these alternatives.
>
> 1) Simply document this case and encourage use of
>      host names in /etc/hosts for NFS servers along with
>      specifying use of file before dns in nsswitch.conf.
>       Doing this results in the mounts working whether or
>        not DNS is working.
>
> 2) Call it a bug and patch mount_nfs(8) to retry getaddrinfo(3)
>       until it succeeds. (I feel this would be a POLA violation,
>       given that the current behaviour has existed for decades
>       and for simple cases where the fqdn will never resolve
>       the behaviour would be to hang at the mount attempt
>       during boot unless "bg" is specified for the /etc/fstab entry.)
>
> 3) Add a new NFS mount option "retrydns=<N>", which would enable
>      retries of getaddrinfo(3). This would avoid any POLA violation and
>      would allow for a convenient way to document the behaviour in
>      "man mount_nfs".
>
> 4) ???
Split the difference?  -1 for "try forever", default to 3, configurable 
up to insanity?  Also, rather than just DNS, make this in the case of 
just about any failure except actual administrative failure (mountd 
refusing the mount, for example).

If this gets added, there should either be an exponential backoff with a 
configurable max (default to 30s), or a configurable static delay 
(default to 3s? 10s?).   The mount_nfs process should log loudly every 
time the delay gets triggered.

Honestly, this would be handy in any number of crazy situations where 
you have a need to wait for something else to start.  I've been bitten 
by the "just fail the mount" behavior before, but I worked around it 
instead of thinking of changing the behavior.

Daniel