Re: nfs hang

From: Mark Millard <marklmi_at_yahoo.com>
Date: Fri, 14 Nov 2025 13:05:57 UTC
On Nov 14, 2025, at 03:53, Rick Macklem <rick.macklem@gmail.com> wrote:

> On Thu, Nov 13, 2025 at 4:51 PM Mark Millard <marklmi@yahoo.com> wrote:
>> 
>> Ronald Klop <ronald_at_FreeBSD.org> wrote on
>> Date: Thu, 13 Nov 2025 17:17:48 UTC :
>> 
>>> Op 13-11-2025 om 14:06 schreef Rick Macklem:
>>>> On Thu, Nov 13, 2025 at 2:45 AM Ronald Klop <ronald@freebsd.org> wrote:
>>>>> 
>>>>> Op 13-11-2025 om 11:41 schreef Ronald Klop:
>>>>>> . . .
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> . . .
>>>> Do you have more than one client mounting the file system?
>>>> If you do, make sure they all have different /etc/hostid's.
>>>> (Cloning a system disk without deleting /etc/hostid can
>>>> result in multiple clients with the same /etc/hostid. That
>>>> mean they are "the same client" to the NFSv4 server
>>>> and that can cause the above.)
>>>> 
>>>> If this is not the problem, I don't know why you'd see the
>>>> above but I suspect the above explains the hang.
>>>> 
>>>> rick
>>>> 
>>> 
>>> 
>>> Two clients. Both have different /etc/hostid.
>>> 
>>> I noticed that the procstat stacks start with "null_reclaim". And poudriere null-mounts the nfs mounts in the poudriere-jails.
>> 
>> Do the poudriere jails on each host use the host's /etc/hostid (by content)?
>> 
>> Any worries about needing poudriere jail /etc/hostid content uniqueness?
> From NFSv4's point of view, /etc/hostid is used at mount time.
> The client will use the one outside of any jail, since that is where the
> mount is done from.
> 
> Now, although I was thinking about the client (since that is where
> the hangs occur), it might be an issue if you were running the nfsd
> in multiple jails that have the same /etc/hostid as well.
> --> The server identifies itself to the client via the /etc/hostid and
>     a different identity means different server.
>     However, it could be a problem if the servers in different jails
>     return the "same server" from the same /etc/hostid.
> 
> I'll admit it as been years since I did the "run nfsd in a vnet jail"
> code and I don't remember if the use of /etc/hostid is vnet'd or
> not.
> 
> rick

Just for reference:

In two poudriere-devel bulk runs from this month for
a ZFS context I've gotten notice sequences like:

Nov  4 22:32:09 7950X3D-ZFS mountd[3870]: Warning: exporting /usr/local/poudriere/data/.m/main-amd64-default/11 exports entire /usr/local/poudriere/data/.m file system

and various associated (same or next second for timestamp) messages
like:

Nov  4 22:32:09 7950X3D-ZFS mountd[3870]: bad exports list line '/usr/local/poudriere/data/.m/main-amd64-default/14': /usr/local/poudriere/data/.m/main-amd64-default/14: lstat() failed: No such file or directory.

followed by the likes of:

Nov  4 22:32:10 7950X3D-ZFS mountd[3870]: can't change attributes for /usr/local/poudriere/data/.m: netcred already exists for given addr/mask

Looks to be timed just after the completion of the last builder:

=>> Cleaning up wrkdir
===>  Cleaning for llvm21-21.1.4
build of devel/llvm21@default | llvm21-21.1.4 ended at 2025-11-04T22:32:09-08:00
build time: 00:29:47

The other example is similar:

Nov  5 06:40:12 7950X3D-ZFS mountd[3870]: Warning: exporting /usr/local/poudriere/data/.m/main-amd64-default/04 exports entire /usr/local/poudriere/data/.m file system

and:

=>> Cleaning up wrkdir
===>  Cleaning for m4-1.4.20,1
build of devel/m4 | m4-1.4.20,1 ended at 2025-11-05T06:40:11-08:00
build time: 00:00:32

In both cases the "Warning: exporting" line lists a Job Id (11 and 04) that was
associated with an earlier builder's activity.

I'm not aware of getting such historically.

The jail involved currently reports as:

# poudriere jail -l
JAILNAME                   VERSION       OSVERSION ARCH  METHOD    TIMESTAMP           PATH
. . .
main-amd64                 16.0-CURRENT            amd64 pkgbase   2025-11-11 20:05:44 /usr/local/poudriere/jails/main-amd64
. . .

And (no poudriere run active):

# zfs list zoptb/poudriere/data/.m
NAME                      USED  AVAIL  REFER  MOUNTPOINT
zoptb/poudriere/data/.m    96K   753G    96K  /usr/local/poudriere/data/.m


>> 
>>> Could nfs+nullfs give some trouble? Or maybe it is just nullfs that hangs everything and the nfs stuff is just a result of it.
>>> 
>>> At the same moment I had git hanging on a non-NFS mount. See attachment for the procstat which also includes nfs-calls.
> 


===
Mark Millard
marklmi at yahoo.com