Re: nfs hang
- In reply to: Mark Millard : "Re: nfs hang"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 14 Nov 2025 17:54:39 UTC
On Fri, Nov 14, 2025 at 5:06 AM Mark Millard <marklmi@yahoo.com> wrote: > > On Nov 14, 2025, at 03:53, Rick Macklem <rick.macklem@gmail.com> wrote: > > > On Thu, Nov 13, 2025 at 4:51 PM Mark Millard <marklmi@yahoo.com> wrote: > >> > >> Ronald Klop <ronald_at_FreeBSD.org> wrote on > >> Date: Thu, 13 Nov 2025 17:17:48 UTC : > >> > >>> Op 13-11-2025 om 14:06 schreef Rick Macklem: > >>>> On Thu, Nov 13, 2025 at 2:45 AM Ronald Klop <ronald@freebsd.org> wrote: > >>>>> > >>>>> Op 13-11-2025 om 11:41 schreef Ronald Klop: > >>>>>> . . . > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> . . . > >>>> Do you have more than one client mounting the file system? > >>>> If you do, make sure they all have different /etc/hostid's. > >>>> (Cloning a system disk without deleting /etc/hostid can > >>>> result in multiple clients with the same /etc/hostid. That > >>>> mean they are "the same client" to the NFSv4 server > >>>> and that can cause the above.) > >>>> > >>>> If this is not the problem, I don't know why you'd see the > >>>> above but I suspect the above explains the hang. > >>>> > >>>> rick > >>>> > >>> > >>> > >>> Two clients. Both have different /etc/hostid. > >>> > >>> I noticed that the procstat stacks start with "null_reclaim". And poudriere null-mounts the nfs mounts in the poudriere-jails. > >> > >> Do the poudriere jails on each host use the host's /etc/hostid (by content)? > >> > >> Any worries about needing poudriere jail /etc/hostid content uniqueness? > > From NFSv4's point of view, /etc/hostid is used at mount time. > > The client will use the one outside of any jail, since that is where the > > mount is done from. > > > > Now, although I was thinking about the client (since that is where > > the hangs occur), it might be an issue if you were running the nfsd > > in multiple jails that have the same /etc/hostid as well. > > --> The server identifies itself to the client via the /etc/hostid and > > a different identity means different server. > > However, it could be a problem if the servers in different jails > > return the "same server" from the same /etc/hostid. > > > > I'll admit it as been years since I did the "run nfsd in a vnet jail" > > code and I don't remember if the use of /etc/hostid is vnet'd or > > not. > > > > rick > > Just for reference: > > In two poudriere-devel bulk runs from this month for > a ZFS context I've gotten notice sequences like: > > Nov 4 22:32:09 7950X3D-ZFS mountd[3870]: Warning: exporting /usr/local/poudriere/data/.m/main-amd64-default/11 exports entire /usr/local/poudriere/data/.m file system The only thing that has changed is that it now tells you that the entire file system is exported instead of doing so silently. Exports in the kernel are and have always been "per file system". NFSv3 (and only NFSv3) has something called "adminitsrative controls" which restricts which directories can get a file handle via the Mount RPC (an NFSv3 sideband protocol). > > and various associated (same or next second for timestamp) messages > like: > > Nov 4 22:32:09 7950X3D-ZFS mountd[3870]: bad exports list line '/usr/local/poudriere/data/.m/main-amd64-default/14': /usr/local/poudriere/data/.m/main-amd64-default/14: lstat() failed: No such file or directory. This one indicates that the directory path does not exist (it cannot have symbolic links in it). Again, this has been the case for literally decades. I cannot even remember why symbolic links aren't allowed, because the decision was made decades ago. > > followed by the likes of: > > Nov 4 22:32:10 7950X3D-ZFS mountd[3870]: can't change attributes for /usr/local/poudriere/data/.m: netcred already exists for given addr/mask You cannot export a file system multiple times to the same host(s)/subnet. rick > > Looks to be timed just after the completion of the last builder: > > =>> Cleaning up wrkdir > ===> Cleaning for llvm21-21.1.4 > build of devel/llvm21@default | llvm21-21.1.4 ended at 2025-11-04T22:32:09-08:00 > build time: 00:29:47 > > The other example is similar: > > Nov 5 06:40:12 7950X3D-ZFS mountd[3870]: Warning: exporting /usr/local/poudriere/data/.m/main-amd64-default/04 exports entire /usr/local/poudriere/data/.m file system > > and: > > =>> Cleaning up wrkdir > ===> Cleaning for m4-1.4.20,1 > build of devel/m4 | m4-1.4.20,1 ended at 2025-11-05T06:40:11-08:00 > build time: 00:00:32 > > In both cases the "Warning: exporting" line lists a Job Id (11 and 04) that was > associated with an earlier builder's activity. > > I'm not aware of getting such historically. > > The jail involved currently reports as: > > # poudriere jail -l > JAILNAME VERSION OSVERSION ARCH METHOD TIMESTAMP PATH > . . . > main-amd64 16.0-CURRENT amd64 pkgbase 2025-11-11 20:05:44 /usr/local/poudriere/jails/main-amd64 > . . . > > And (no poudriere run active): > > # zfs list zoptb/poudriere/data/.m > NAME USED AVAIL REFER MOUNTPOINT > zoptb/poudriere/data/.m 96K 753G 96K /usr/local/poudriere/data/.m > > > >> > >>> Could nfs+nullfs give some trouble? Or maybe it is just nullfs that hangs everything and the nfs stuff is just a result of it. > >>> > >>> At the same moment I had git hanging on a non-NFS mount. See attachment for the procstat which also includes nfs-calls. > > > > > === > Mark Millard > marklmi at yahoo.com >