Re: nfs hang

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Fri, 14 Nov 2025 17:54:39 UTC
On Fri, Nov 14, 2025 at 5:06 AM Mark Millard <marklmi@yahoo.com> wrote:
>
> On Nov 14, 2025, at 03:53, Rick Macklem <rick.macklem@gmail.com> wrote:
>
> > On Thu, Nov 13, 2025 at 4:51 PM Mark Millard <marklmi@yahoo.com> wrote:
> >>
> >> Ronald Klop <ronald_at_FreeBSD.org> wrote on
> >> Date: Thu, 13 Nov 2025 17:17:48 UTC :
> >>
> >>> Op 13-11-2025 om 14:06 schreef Rick Macklem:
> >>>> On Thu, Nov 13, 2025 at 2:45 AM Ronald Klop <ronald@freebsd.org> wrote:
> >>>>>
> >>>>> Op 13-11-2025 om 11:41 schreef Ronald Klop:
> >>>>>> . . .
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> . . .
> >>>> Do you have more than one client mounting the file system?
> >>>> If you do, make sure they all have different /etc/hostid's.
> >>>> (Cloning a system disk without deleting /etc/hostid can
> >>>> result in multiple clients with the same /etc/hostid. That
> >>>> mean they are "the same client" to the NFSv4 server
> >>>> and that can cause the above.)
> >>>>
> >>>> If this is not the problem, I don't know why you'd see the
> >>>> above but I suspect the above explains the hang.
> >>>>
> >>>> rick
> >>>>
> >>>
> >>>
> >>> Two clients. Both have different /etc/hostid.
> >>>
> >>> I noticed that the procstat stacks start with "null_reclaim". And poudriere null-mounts the nfs mounts in the poudriere-jails.
> >>
> >> Do the poudriere jails on each host use the host's /etc/hostid (by content)?
> >>
> >> Any worries about needing poudriere jail /etc/hostid content uniqueness?
> > From NFSv4's point of view, /etc/hostid is used at mount time.
> > The client will use the one outside of any jail, since that is where the
> > mount is done from.
> >
> > Now, although I was thinking about the client (since that is where
> > the hangs occur), it might be an issue if you were running the nfsd
> > in multiple jails that have the same /etc/hostid as well.
> > --> The server identifies itself to the client via the /etc/hostid and
> >     a different identity means different server.
> >     However, it could be a problem if the servers in different jails
> >     return the "same server" from the same /etc/hostid.
> >
> > I'll admit it as been years since I did the "run nfsd in a vnet jail"
> > code and I don't remember if the use of /etc/hostid is vnet'd or
> > not.
> >
> > rick
>
> Just for reference:
>
> In two poudriere-devel bulk runs from this month for
> a ZFS context I've gotten notice sequences like:
>
> Nov  4 22:32:09 7950X3D-ZFS mountd[3870]: Warning: exporting /usr/local/poudriere/data/.m/main-amd64-default/11 exports entire /usr/local/poudriere/data/.m file system
The only thing that has changed is that it now tells you that
the entire file system is exported instead of doing so silently.
Exports in the kernel are and have always been "per file system".
NFSv3 (and only NFSv3) has something called "adminitsrative controls"
which restricts which directories can get a file handle via the Mount RPC
(an NFSv3 sideband protocol).

>
> and various associated (same or next second for timestamp) messages
> like:
>
> Nov  4 22:32:09 7950X3D-ZFS mountd[3870]: bad exports list line '/usr/local/poudriere/data/.m/main-amd64-default/14': /usr/local/poudriere/data/.m/main-amd64-default/14: lstat() failed: No such file or directory.

This one indicates that the directory path does not exist (it cannot
have symbolic links in it). Again, this has been the case for literally
decades.

I cannot even remember why symbolic links aren't allowed, because
the decision was made decades ago.

>
> followed by the likes of:
>
> Nov  4 22:32:10 7950X3D-ZFS mountd[3870]: can't change attributes for /usr/local/poudriere/data/.m: netcred already exists for given addr/mask

You cannot export a file system multiple times to the same host(s)/subnet.

rick

>
> Looks to be timed just after the completion of the last builder:
>
> =>> Cleaning up wrkdir
> ===>  Cleaning for llvm21-21.1.4
> build of devel/llvm21@default | llvm21-21.1.4 ended at 2025-11-04T22:32:09-08:00
> build time: 00:29:47
>
> The other example is similar:
>
> Nov  5 06:40:12 7950X3D-ZFS mountd[3870]: Warning: exporting /usr/local/poudriere/data/.m/main-amd64-default/04 exports entire /usr/local/poudriere/data/.m file system
>
> and:
>
> =>> Cleaning up wrkdir
> ===>  Cleaning for m4-1.4.20,1
> build of devel/m4 | m4-1.4.20,1 ended at 2025-11-05T06:40:11-08:00
> build time: 00:00:32
>
> In both cases the "Warning: exporting" line lists a Job Id (11 and 04) that was
> associated with an earlier builder's activity.
>
> I'm not aware of getting such historically.
>
> The jail involved currently reports as:
>
> # poudriere jail -l
> JAILNAME                   VERSION       OSVERSION ARCH  METHOD    TIMESTAMP           PATH
> . . .
> main-amd64                 16.0-CURRENT            amd64 pkgbase   2025-11-11 20:05:44 /usr/local/poudriere/jails/main-amd64
> . . .
>
> And (no poudriere run active):
>
> # zfs list zoptb/poudriere/data/.m
> NAME                      USED  AVAIL  REFER  MOUNTPOINT
> zoptb/poudriere/data/.m    96K   753G    96K  /usr/local/poudriere/data/.m
>
>
> >>
> >>> Could nfs+nullfs give some trouble? Or maybe it is just nullfs that hangs everything and the nfs stuff is just a result of it.
> >>>
> >>> At the same moment I had git hanging on a non-NFS mount. See attachment for the procstat which also includes nfs-calls.
> >
>
>
> ===
> Mark Millard
> marklmi at yahoo.com
>