Re: nfs hang
- Reply: Rick Macklem : "Re: nfs hang"
- In reply to: Rick Macklem : "Re: nfs hang"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 14 Nov 2025 13:05:57 UTC
On Nov 14, 2025, at 03:53, Rick Macklem <rick.macklem@gmail.com> wrote: > On Thu, Nov 13, 2025 at 4:51 PM Mark Millard <marklmi@yahoo.com> wrote: >> >> Ronald Klop <ronald_at_FreeBSD.org> wrote on >> Date: Thu, 13 Nov 2025 17:17:48 UTC : >> >>> Op 13-11-2025 om 14:06 schreef Rick Macklem: >>>> On Thu, Nov 13, 2025 at 2:45 AM Ronald Klop <ronald@freebsd.org> wrote: >>>>> >>>>> Op 13-11-2025 om 11:41 schreef Ronald Klop: >>>>>> . . . >>>>>> >>>>>> >>>>> >>>>> >>>>> . . . >>>> Do you have more than one client mounting the file system? >>>> If you do, make sure they all have different /etc/hostid's. >>>> (Cloning a system disk without deleting /etc/hostid can >>>> result in multiple clients with the same /etc/hostid. That >>>> mean they are "the same client" to the NFSv4 server >>>> and that can cause the above.) >>>> >>>> If this is not the problem, I don't know why you'd see the >>>> above but I suspect the above explains the hang. >>>> >>>> rick >>>> >>> >>> >>> Two clients. Both have different /etc/hostid. >>> >>> I noticed that the procstat stacks start with "null_reclaim". And poudriere null-mounts the nfs mounts in the poudriere-jails. >> >> Do the poudriere jails on each host use the host's /etc/hostid (by content)? >> >> Any worries about needing poudriere jail /etc/hostid content uniqueness? > From NFSv4's point of view, /etc/hostid is used at mount time. > The client will use the one outside of any jail, since that is where the > mount is done from. > > Now, although I was thinking about the client (since that is where > the hangs occur), it might be an issue if you were running the nfsd > in multiple jails that have the same /etc/hostid as well. > --> The server identifies itself to the client via the /etc/hostid and > a different identity means different server. > However, it could be a problem if the servers in different jails > return the "same server" from the same /etc/hostid. > > I'll admit it as been years since I did the "run nfsd in a vnet jail" > code and I don't remember if the use of /etc/hostid is vnet'd or > not. > > rick Just for reference: In two poudriere-devel bulk runs from this month for a ZFS context I've gotten notice sequences like: Nov 4 22:32:09 7950X3D-ZFS mountd[3870]: Warning: exporting /usr/local/poudriere/data/.m/main-amd64-default/11 exports entire /usr/local/poudriere/data/.m file system and various associated (same or next second for timestamp) messages like: Nov 4 22:32:09 7950X3D-ZFS mountd[3870]: bad exports list line '/usr/local/poudriere/data/.m/main-amd64-default/14': /usr/local/poudriere/data/.m/main-amd64-default/14: lstat() failed: No such file or directory. followed by the likes of: Nov 4 22:32:10 7950X3D-ZFS mountd[3870]: can't change attributes for /usr/local/poudriere/data/.m: netcred already exists for given addr/mask Looks to be timed just after the completion of the last builder: =>> Cleaning up wrkdir ===> Cleaning for llvm21-21.1.4 build of devel/llvm21@default | llvm21-21.1.4 ended at 2025-11-04T22:32:09-08:00 build time: 00:29:47 The other example is similar: Nov 5 06:40:12 7950X3D-ZFS mountd[3870]: Warning: exporting /usr/local/poudriere/data/.m/main-amd64-default/04 exports entire /usr/local/poudriere/data/.m file system and: =>> Cleaning up wrkdir ===> Cleaning for m4-1.4.20,1 build of devel/m4 | m4-1.4.20,1 ended at 2025-11-05T06:40:11-08:00 build time: 00:00:32 In both cases the "Warning: exporting" line lists a Job Id (11 and 04) that was associated with an earlier builder's activity. I'm not aware of getting such historically. The jail involved currently reports as: # poudriere jail -l JAILNAME VERSION OSVERSION ARCH METHOD TIMESTAMP PATH . . . main-amd64 16.0-CURRENT amd64 pkgbase 2025-11-11 20:05:44 /usr/local/poudriere/jails/main-amd64 . . . And (no poudriere run active): # zfs list zoptb/poudriere/data/.m NAME USED AVAIL REFER MOUNTPOINT zoptb/poudriere/data/.m 96K 753G 96K /usr/local/poudriere/data/.m >> >>> Could nfs+nullfs give some trouble? Or maybe it is just nullfs that hangs everything and the nfs stuff is just a result of it. >>> >>> At the same moment I had git hanging on a non-NFS mount. See attachment for the procstat which also includes nfs-calls. > === Mark Millard marklmi at yahoo.com