Re: nfs hang

From: Ronald Klop <ronald_at_FreeBSD.org>
Date: Thu, 13 Nov 2025 17:17:48 UTC
Op 13-11-2025 om 14:06 schreef Rick Macklem:
> On Thu, Nov 13, 2025 at 2:45 AM Ronald Klop <ronald@freebsd.org> wrote:
>>
>> Op 13-11-2025 om 11:41 schreef Ronald Klop:
>>> Hi,
>>>
>>> I have setup nfsd in a jail. It exports zfs fs. The kernel is 16-CURRENT/aarch64. Jails are 14.3-RELEASE.
>>> $ cat /data/jails/pkg/_root/etc/exports
>>> V4:    /    -sec=sys
>>>
>>> /usr/local/poudriere/data/logs/bulk    -sec=sys -maproot=root
>>> /usr/local/poudriere/data/packages    -sec=sys -maproot=root
>>>
>>> /usr/ports    -sec=sys
>>>
>>>
>>> The clients run poudriere in jails.
>>>
>>> Now and than I get hanging processes and unresponsive nfs server messages.
>>>
>>> All NFS threads are in this state:
>>> [root@rpi4 ~]# procstat -kk 5973
>>>     PID    TID COMM                TDNAME              KSTACK
>>>    5973 100541 nfsd                nfsd: master        mi_switch+0x100 sleepq_catch_signals+0x3e4 sleepq_timedwait_sig+0x18 _sleep+0x1a0 clnt_vc_call+0x814 clnt_reconnect_call+0x960 newnfs_request+0xacc nfsrpc_closerpc+0xfc nfscl_tryclose+0x58 nfsrpc_doclose+0x294 nfscl_doclose+0x390 nfsrpc_close+0x28 ncl_inactive+0x14c vop_sigdefer+0x34 vinactivef+0xb8 vput_final+0x1f4 null_reclaim+0x1a0 VOP_RECLAIM_APV+0x20
>>>    5973 100784 nfsd                nfsd: service       mi_switch+0x100 sleepq_catch_signals+0x3e4 sleepq_timedwait_sig+0x18 _sleep+0x1a0 clnt_vc_call+0x814 clnt_reconnect_call+0x960 newnfs_request+0xacc nfsrpc_closerpc+0xfc nfscl_tryclose+0x58 nfsrpc_doclose+0x294 nfscl_doclose+0x390 nfsrpc_close+0x28 ncl_inactive+0x14c vop_sigdefer+0x34 vinactivef+0xb8 vput_final+0x1f4 null_reclaim+0x1a0 VOP_RECLAIM_APV+0x20
>>>    5973 100785 nfsd                nfsd: service       mi_switch+0x100 sleepq_catch_signals+0x3e4 sleepq_timedwait_sig+0x18 _sleep+0x1a0 clnt_vc_call+0x814 clnt_reconnect_call+0x960 newnfs_request+0xacc nfsrpc_closerpc+0xfc nfscl_tryclose+0x58 nfsrpc_doclose+0x294 nfscl_doclose+0x390 nfsrpc_close+0x28 ncl_inactive+0x14c vop_sigdefer+0x34 vinactivef+0xb8 vput_final+0x1f4 null_reclaim+0x1a0 VOP_RECLAIM_APV+0x20
>>>    5973 100786 nfsd                nfsd: service       mi_switch+0x100 sleepq_catch_signals+0x3e4 sleepq_timedwait_sig+0x18 _sleep+0x1a0 clnt_vc_call+0x814 clnt_reconnect_call+0x960 newnfs_request+0xacc nfsrpc_closerpc+0xfc nfscl_tryclose+0x58 nfsrpc_doclose+0x294 nfscl_doclose+0x390 nfsrpc_close+0x28 ncl_inactive+0x14c vop_sigdefer+0x34 vinactivef+0xb8 vput_final+0x1f4 null_reclaim+0x1a0 VOP_RECLAIM_APV+0x20
>>> ... and a couple more similar lines ...
>>>
>>> In rc.conf:
>>> nfs_server_enable=YES
>>> mountd_enable=YES
>>> nfsv4_server_only=YES
>>> nfs_server_flags="-t"
>>>
>>> The filesystems are a zfs legacy mount in the jail:
>>> # grep zfs /data/jails/pkg/fstab
>>> zrpi4/data/poudriere-logs-bulk    /data/jails/pkg/_root/usr/local/poudriere/data/logs/bulk    zfs    rw    0    0
>>> zrpi4/data/poudriere-packages    /data/jails/pkg/_root/usr/local/poudriere/data/packages    zfs    rw    0    0
>>> zdata4/ports    /data/jails/pkg/_root/usr/ports    zfs    rw    0    0
>>>
>>>
>>> Interestingly I also have a bash process hanging which should not access NFS at the moment:
>>> # procstat -kk 83175
>>>     PID    TID COMM                TDNAME              KSTACK
>>> 83175 111203 bash                -                   mi_switch+0x100 sleeplk+0xf8 lockmgr_slock_hard+0x29c _vn_lock+0x50 vget_finish+0x28 cache_fplookup_final_child+0x54 cache_fplookup+0x538 namei+0xd8 kern_statat+0xd4 sys_fstatat+0x2c do_el0_sync+0x6b4 handle_el0_sync+0x4c
>>>
>>> Any thoughts?
>>>
>>> Regards,
>>> Ronald.
>>>
>>>
>>
>>
>> Just noticed this on the console:
>> newnfs: server 'pkg.thuis.klop.ws' error: fileid changed. fsid b7c1283e:5166a2de: expected fileid 0x22, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE)
>> newnfs: server 'pkg.thuis.klop.ws' error: fileid changed. fsid 3750dc87:289259de: expected fileid 0x22, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE)
>> newnfs: server 'pkg.thuis.klop.ws' error: fileid changed. fsid 263bc0f2:d39c94de: expected fileid 0x22, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE)
>> newnfs: server 'pkg.thuis.klop.ws' error: fileid changed. fsid b7c1283e:5166a2de: expected fileid 0x22, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE)
>> newnfs: server 'pkg.thuis.klop.ws' error: fileid changed. fsid 3750dc87:289259de: expected fileid 0x22, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE)
>> newnfs: server 'pkg.thuis.klop.ws' error: fileid changed. fsid 263bc0f2:d39c94de: expected fileid 0x22, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE)
>> newnfs: server 'pkg.thuis.klop.ws' error: fileid changed. fsid b7c1283e:5166a2de: expected fileid 0x22, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE)
>> newnfs: server 'pkg.thuis.klop.ws' error: fileid changed. fsid 3750dc87:289259de: expected fileid 0x22, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE)
>> newnfs: server 'pkg.thuis.klop.ws' error: fileid changed. fsid b7c1283e:5166a2de: expected fileid 0x22, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE)
>> newnfs: server 'pkg.thuis.klop.ws' error: fileid changed. fsid 3750dc87:289259de: expected fileid 0x22, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE)
>> newnfs: Logged 10 times about fileid corruption; going quiet to avoid spamming logs excessively. (Limit is: 10).
> Do you have more than one client mounting the file system?
> If you do, make sure they all have different /etc/hostid's.
> (Cloning a system disk without deleting /etc/hostid can
> result in multiple clients with the same /etc/hostid. That
> mean they are "the same client" to the NFSv4 server
> and that can cause the above.)
> 
> If this is not the problem, I don't know why you'd see the
> above but I suspect the above explains the hang.
> 
> rick
> 


Two clients. Both have different /etc/hostid.

I noticed that the procstat stacks start with "null_reclaim". And poudriere null-mounts the nfs mounts in the poudriere-jails.

Could nfs+nullfs give some trouble? Or maybe it is just nullfs that hangs everything and the nfs stuff is just a result of it.

At the same moment I had git hanging on a non-NFS mount. See attachment for the procstat which also includes nfs-calls.

Regards,
Ronald.