Re: Module variable initialization
- In reply to: Zhenlei Huang : "Re: Module variable initialization"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 12 Dec 2024 16:27:20 UTC
On Thu, Dec 12, 2024 at 1:46 AM Zhenlei Huang <zlei@freebsd.org> wrote: > > > > On Dec 12, 2024, at 10:44 AM, Rick Macklem <rick.macklem@gmail.com> wrote: > > Hi, > > Bugzilla pr#282156 reports a crash that appears to be caused by > a NFS client variable (nfscbd_pool) not being initialized when a > NFS mount is done. > > Now, the NFS client module (nfscl.ko) is weird in that it has > two definitions for the module. There is a VFS_SET() one for > the file system and a separate DECLARE_MODULE() for nfscl. > (The latter exists so that the module can refuse to unload and > define dependencies on other modules.) > > The variable (nfscbd_pool) is initialized in the modevent() function > for nfscl in the MOD_LOAD section. > > > Does anyone know if this can somehow result in the variable not > being initialized when an NFS mount occurs? > > > I'm not familiar with NFS. From a quick look of the source code I think > `nfscbd_pool` is correctly initialized. > > I do not know the exact version pr#282156, so I guess and tried 14.1-p1, > ``` > $ addr2line -fip -e /.zfs/snapshot/14.1-p1/usr/lib/debug/boot/kernel/kernel.debug 0xffffffff80e1c558 > svc_run at /usr/src/sys/rpc/svc.c:1414 > ``` > > https://cgit.freebsd.org/src/tree/sys/rpc/svc.c?h=releng/14.1&id=0892dff104440867956a53e78c12d66090fec36b#n1414 > > If `nfscbd_pool` is NULL, then I expect the panic should happens earlier. Say line 1405 or event earlier line 1389 . This is not a panic most see. I haven't been able to reproduce it and I think it has been reported by one other person. However, if you look at 282156, you'll see that the address is d0 (208). This is almost guaranteed to be a NULL pointer to a structure. I looked at the structures accessed via the callback stuff and the only one with a field at offset 208 is SVCPOOL. The code does use nfscbd_pool in the NFS common code, where it sets xprt->xp_pool = nfscbd_pool in svc_vc_create_backchannel(), which is called when a TCP connection for a mount is created. --> So, one theory is that the code somehow does this before nfscbd_pool is set non-null. --> Another would be that some runaway pointer set it back to NULL. Anyhow, I will probably get the reporter to add some sanity checks to their code to try and see what the results of that are. > > Maybe `svc_run_internal()` is to be blamed ? In a sense, yes, in that the crash occurs in there. However, this code is worked heavily for the NFS server and doesn't demonstrate issues that I am aware of. Thanks for your comments, rick > > > And, if the above is possible, would doing the initialization in the > vfs_init function for VFS_SET() be guaranteed to happen before > a mount is done? > > > The order of modules seems right to me. nfscl module has order SI_ORDER_FIRST > and VFS_SET(... nfs ... ) has SI_ORDER_MIDDLE. One additional issue is that nfscbd_pool is actually declared in the nfscommon module and the use of it that might "save" the NULL value in xp_pool before it gets set non-NULL is also in nfscommon. Maybe there is some memory cache issue where the stale (set to NULL) value for nfscbd_pool persists in the nfscommon module for long enough to cause this? (It is a stretch, but I am pretty well running out of thoughts on this.) --> I could move the initialization into the nfscommon module easily enough. I might create such a patch for the reporter to try. Thanks for your comments, rick > > > Thanks for any help with this, rick > > > Best regards, > Zhenlei >