From nobody Fri Feb 21 08:12:54 2025 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4YzjYl3xF0z5pMfb for ; Fri, 21 Feb 2025 08:13:15 +0000 (UTC) (envelope-from tsoome@me.com) Received: from pv50p00im-ztdg10011201.me.com (pv50p00im-ztdg10011201.me.com [17.58.6.39]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4YzjYk3KT5z3S5j for ; Fri, 21 Feb 2025 08:13:14 +0000 (UTC) (envelope-from tsoome@me.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=me.com header.s=1a1hai header.b=T4sRLCIX; dmarc=pass (policy=quarantine) header.from=me.com; spf=pass (mx1.freebsd.org: domain of tsoome@me.com designates 17.58.6.39 as permitted sender) smtp.mailfrom=tsoome@me.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=me.com; s=1a1hai; bh=O9IqJNImP4w5Y6Ho5fdt+yEaafktBT5dCRaG9u5oqoE=; h=From:Message-Id:Content-Type:Mime-Version:Subject:Date:To:x-icloud-hme; b=T4sRLCIXY9LZn3I+FwojywnfP13ZG+LxCbhodDg08tNfGd1+Fzn6Heq1nm1PHdsoh cB8DNFc6Q2h1pSdmgS4XpY/qFxuXKnmf1cWUxgRBhQs0A1aGx3rDa4HFcsf8wCKnZi qZ8u75z+OCkYwUHTq2GjTvzrd9bGLsP6ezMeJF6YkALLtu30LCO4PLN6q3r0fBh1lw wEklO6NVTHwGA3wXkv2j3TdkcpOnp+iwP0qcs7AAbKFBT8f2zjG0TEjgn1FsFFUSKY FFEZLB+OX3JsVPJ10DB7ZhwsLRovK8t+jRqMgprclpo2VvbRLaon6KFTKLfNWDwRAC WLcfWeb1sTL9w== Received: from smtpclient.apple (pv50p00im-dlb-asmtp-mailmevip.me.com [17.56.9.10]) by pv50p00im-ztdg10011201.me.com (Postfix) with ESMTPSA id 7761368028A; Fri, 21 Feb 2025 08:13:07 +0000 (UTC) From: Toomas Soome Message-Id: <862576B0-EFBF-4CC9-B99A-723125D60983@me.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_0B07C6C2-EAA2-4586-90B7-D97B8E4A468B" List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.400.131.1.6\)) Subject: Re: RFC: mount_nfs failure due to dns not running yet Date: Fri, 21 Feb 2025 10:12:54 +0200 In-Reply-To: Cc: Steve Rikli , Gleb Smirnoff , Rick Macklem To: FreeBSD CURRENT References: X-Mailer: Apple Mail (2.3826.400.131.1.6) X-Proofpoint-GUID: S8QUrAu9uBAkLAHHyNilGyDOlyCqblOl X-Proofpoint-ORIG-GUID: S8QUrAu9uBAkLAHHyNilGyDOlyCqblOl X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-21_01,2025-02-20_02,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 adultscore=0 phishscore=0 suspectscore=0 mlxscore=0 clxscore=1011 bulkscore=0 mlxlogscore=999 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2308100000 definitions=main-2502210061 X-Spamd-Result: default: False [-2.90 / 15.00]; SUSPICIOUS_RECIPS(1.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; RBL_SENDERSCORE_REPUT_9(-1.00)[17.58.6.39:from]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; MV_CASE(0.50)[]; DMARC_POLICY_ALLOW(-0.50)[me.com,quarantine]; ONCE_RECEIVED(0.20)[]; R_DKIM_ALLOW(-0.20)[me.com:s=1a1hai]; R_SPF_ALLOW(-0.20)[+ip4:17.58.0.0/16]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; RCVD_IN_DNSWL_LOW(-0.10)[17.58.6.39:from]; RCVD_TLS_ALL(0.00)[]; FREEMAIL_CC(0.00)[genyosha.net,glebi.us,gmail.com]; FROM_HAS_DN(0.00)[]; TO_DN_ALL(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ARC_NA(0.00)[]; FREEMAIL_FROM(0.00)[me.com]; DKIM_TRACE(0.00)[me.com:+]; FREEFALL_USER(0.00)[tsoome]; FREEMAIL_ENVFROM(0.00)[me.com]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; MID_RHS_MATCH_FROM(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TAGGED_RCPT(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RWL_MAILSPIKE_POSSIBLE(0.00)[17.58.6.39:from]; RCVD_COUNT_ONE(0.00)[1]; ASN(0.00)[asn:714, ipnet:17.58.0.0/20, country:US]; DWL_DNSWL_NONE(0.00)[me.com:dkim] X-Rspamd-Queue-Id: 4YzjYk3KT5z3S5j X-Spamd-Bar: -- --Apple-Mail=_0B07C6C2-EAA2-4586-90B7-D97B8E4A468B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On 21. Feb 2025, at 04:39, Rick Macklem = wrote: >=20 > On Thu, Feb 20, 2025 at 4:28=E2=80=AFPM Steve Rikli = wrote: >>=20 >> On Wed, Feb 19, 2025 at 02:40:15PM -0800, Rick Macklem wrote: >>>=20 >>> The subject line basically describes the problem glebius@ >>> ran into. When doing an NFS mount in /etc/fstab, it failed >>> since the DNS service was not yet working and, as such, >>> the DNS lookup of the server fqdn failed, causing the mount >>> to fail. Note that this behaviour has existed for decades. >>>=20 >>> He feels this is a bug and that mount_nfs(8) should retry >>> getaddrinfo(3) calls until success, instead of failing the >>> mount when the first attempt fails. >>> The problem with just retrying getaddrinfo(3) is that it >>> could retry forever for simple failures like a typo in the >>> server fqdn. >>> I can see several ways this can be handled and would >>> like feedback from others w.r.t. these alternatives. >>>=20 >>> 1) Simply document this case and encourage use of >>> host names in /etc/hosts for NFS servers along with >>> specifying use of file before dns in nsswitch.conf. >>> Doing this results in the mounts working whether or >>> not DNS is working. >>>=20 >>> 2) Call it a bug and patch mount_nfs(8) to retry getaddrinfo(3) >>> until it succeeds. (I feel this would be a POLA violation, >>> given that the current behaviour has existed for decades >>> and for simple cases where the fqdn will never resolve >>> the behaviour would be to hang at the mount attempt >>> during boot unless "bg" is specified for the /etc/fstab entry.) >>>=20 >>> 3) Add a new NFS mount option "retrydns=3D", which would enable >>> retries of getaddrinfo(3). This would avoid any POLA violation = and >>> would allow for a convenient way to document the behaviour in >>> "man mount_nfs". >>>=20 >>> 4) ??? >>>=20 >>> So, what do you think is the preferred change? >>=20 >> I don't think I would change mount_nfs code behavior for this. >>=20 >> That is, requiring services and daemons etc. to workaround missing, >> misconfigured, slow, or misbehaving nameservice (whether it's DNS, >> /etc/hosts, NIS, whatever) seems like more complexity, possibly not >> effective, and maybe not focused on the right thing. >>=20 >> Now, without meaning to be presumptuous, it may be worth re-examining >> the startup sequence, e.g. to make sure NFS mounts are tried after = the >> known dependencies can reasonably be expected to have started, = including >> the network, plus local_unbound or bind (if used), possibly others. >>=20 >> After a quick look, I don't see an obvious problem with the sequence, >> but more knowledgeable eyes than mine are welcome. I don't quite = follow >> some of the output from rcorder and service -r. >>=20 >>> ps: I looked and the return value from getaddrinfo(3) does not >>> appear to be useful to discern the case of "DNS service not >>> running yet". (I think it replies EAI_FAIL for this case.) >>=20 >> In that area, I'll note FreeBSD rc.d has a "NETWORKING" dependency = for >> PROVIDE and REQUIRE, and it's included in scripts like nfsclient, >> mountcritremote et al. However there seems to be no similar = dependency >> for something like "NAMESERVICE" (generic, as opposed to "named" >> specifically), and I'm not sure how that might be implemented, even >> assuming it could be useful in a situation like this. >>=20 >> I.e. there are many things to potentially check for "can the system >> resolve hostnames yet", and not all of them involve running a local >> instance of named, unbound, etc. >>=20 >> In general, if I were running into problems with nameservice not = being >> available by the time NFS mounts happen, I think I'd start by looking >> into possible nameservice issues, then check out some mechanisms = other >> folks have mentioned (fstab IP addresses or late option, rc.conf >> netwait_enable, etc.) rather than coding workarounds into NFS itself. > Well, the patch I have created (it took about 15min) only changes = behaviour > if a new "retrydns" option i used. As such, I think it might be useful = for some, > but doesn't change things unless someone uses it. >=20 > I agree with you that I don't think the rc scripts have a way to check = REQUIRE > dns working. (I, personally, always put the fqdn for NFS servers in = /etc/hosts > and make sure "files" is first in nsswitch.conf, but others argue that = is not > feasible for some deployments. (Using IP numbers works for AUTH_SYS, > but not Kerberized mounts.) >=20 > Note that there is already "retrycnt", which specifies retry the = mount, > but that retry loop doesn't include getaddrinfo(3) calls. > --> Personally, I do not like always doing retries since I often > type mount commands manually and I'm a terrible typist, so I > often mistype the server's name. >=20 > This reply was mostly a followup on all the good comments and > not just yours. >=20 > Thanks everyone, for your comments, rick >=20 my 2cents: there is a difference of name service not responding and name not = resolving. In first case, it will go to: bg If an initial attempt to contact the server fails, = fork off a child to keep trying the mount in the = background. Useful for fstab(5), where the file system mount is = not critical to multiuser operation. bgnow Like bg, fork off a child to keep trying the mount = in the background, but do not attempt to mount in the = foreground first. This eliminates a 60+ second timeout when = the server is not responding. Useful for speeding up = the boot process of a client when the server is likely = to be unavailable. This is often the case for = interdependent servers such as cross-mounted servers (each of two servers is an NFS client of the other) and for = cluster nodes that must boot before the file servers. in second case, its a failure you can not recover from. rgds, toomas --Apple-Mail=_0B07C6C2-EAA2-4586-90B7-D97B8E4A468B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On 21. Feb 2025, at 04:39, Rick Macklem = <rick.macklem@gmail.com> wrote:

On Thu, Feb 20, 2025 at = 4:28=E2=80=AFPM Steve Rikli <sr@genyosha.net> = wrote:

On Wed, Feb 19, 2025 at = 02:40:15PM -0800, Rick Macklem wrote:

The= subject line basically describes the problem glebius@
ran into. =  When doing an NFS mount in /etc/fstab, it failed
since the DNS = service was not yet working and, as such,
the DNS lookup of the = server fqdn failed, causing the mount
to fail. Note that this = behaviour has existed for decades.

He feels this is a bug and = that mount_nfs(8) should retry
getaddrinfo(3) calls until success, = instead of failing the
mount when the first attempt fails.
The = problem with just retrying getaddrinfo(3) is that it
could retry = forever for simple failures like a typo in the
server fqdn.
I can = see several ways this can be handled and would
like feedback from = others w.r.t. these alternatives.

1) Simply document this case = and encourage use of
   host names in /etc/hosts for = NFS servers along with
   specifying use of file = before dns in nsswitch.conf.
    Doing this = results in the mounts working whether or
=      not DNS is working.

2) Call it a = bug and patch mount_nfs(8) to retry getaddrinfo(3)
=     until it succeeds. (I feel this would be a POLA = violation,
    given that the current behaviour = has existed for decades
    and for simple cases = where the fqdn will never resolve
    the = behaviour would be to hang at the mount attempt
=     during boot unless "bg" is specified for the = /etc/fstab entry.)

3) Add a new NFS mount option = "retrydns=3D<N>", which would enable
   retries = of getaddrinfo(3). This would avoid any POLA violation and
=    would allow for a convenient way to document the = behaviour in
   "man mount_nfs".

4) = ???

So, what do you think is the preferred = change?

I don't think I would change mount_nfs code = behavior for this.

That is, requiring services and daemons etc. = to workaround missing,
misconfigured, slow, or misbehaving = nameservice (whether it's DNS,
/etc/hosts, NIS, whatever) seems like = more complexity, possibly not
effective, and maybe not focused on the = right thing.

Now, without meaning to be presumptuous, it may be = worth re-examining
the startup sequence, e.g. to make sure NFS mounts = are tried after the
known dependencies can reasonably be expected to = have started, including
the network, plus local_unbound or bind (if = used), possibly others.

After a quick look, I don't see an = obvious problem with the sequence,
but more knowledgeable eyes than = mine are welcome.  I don't quite follow
some of the output from = rcorder and service -r.

ps: I looked = and the return value from getaddrinfo(3) does not
=      appear to be useful to discern the case of = "DNS service not
     running yet". (I = think it replies EAI_FAIL for this case.)

In that = area, I'll note FreeBSD rc.d has a "NETWORKING" dependency = for
PROVIDE and REQUIRE, and it's included in scripts like = nfsclient,
mountcritremote et al. However there seems to be no = similar dependency
for something like "NAMESERVICE" (generic, as = opposed to "named"
specifically), and I'm not sure how that might be = implemented, even
assuming it could be useful in a situation like = this.

I.e. there are many things to potentially check for "can = the system
resolve hostnames yet", and not all of them involve = running a local
instance of named, unbound, etc.

In general, = if I were running into problems with nameservice not being
available = by the time NFS mounts happen, I think I'd start by looking
into = possible nameservice issues, then check out some mechanisms = other
folks have mentioned (fstab IP addresses or late option, = rc.conf
netwait_enable, etc.) rather than coding workarounds into NFS = itself.
Well, the patch I have created (it took about = 15min) only changes behaviour
if a new "retrydns" option i used. As = such, I think it might be useful for some,
but doesn't change things = unless someone uses it.

I agree with you that I don't think the = rc scripts have a way to check REQUIRE
dns working. (I, personally, = always put the fqdn for NFS servers in /etc/hosts
and make sure = "files" is first in nsswitch.conf, but others argue that is = not
feasible for some deployments. (Using IP numbers works for = AUTH_SYS,
but not Kerberized mounts.)

Note that there is = already "retrycnt", which specifies retry the mount,
but that retry = loop doesn't include getaddrinfo(3) calls.
--> Personally, I do = not like always doing retries since I often
=     type mount commands manually and I'm a terrible = typist, so I
    often mistype the server's = name.

This reply was mostly a followup on all the good comments = and
not just yours.

Thanks everyone, for your comments, = rick


my = 2cents:

there is a difference of name service = not responding and name not resolving. In first case, it will go = to:

            =  bg      If an initial attempt to contact the = server fails, fork

   =                   off a = child to keep trying the mount in the background.

   =                   Useful = for fstab(5), where the file system mount is not

   =                   critical = to multiuser operation.


   =           bgnow   Like bg, = fork off a child to keep trying the mount in the

   =                   = background, but do not attempt to mount in the foreground

   =                   = first.  This eliminates a 60+ second timeout when the

   =                   server is = not responding.  Useful for speeding up the

   =                   boot = process of a client when the server is likely to be

   =                   = unavailable.  This is often the case for interdependent

   =                   servers = such as cross-mounted servers (each of two

   =                   servers = is an NFS client of the other) and for cluster

   =                   nodes = that must boot before the file servers.


in second case, its a failure you = can not recover from.


rgds,

toomas





= --Apple-Mail=_0B07C6C2-EAA2-4586-90B7-D97B8E4A468B--