Re: Kernel modules not loading on 15-prerelease

From: bob prohaska <fbsd_at_www.zefox.net>
Date: Thu, 04 Sep 2025 00:39:37 UTC
On Wed, Sep 03, 2025 at 10:08:46AM -0700, Mark Millard wrote:
> On Sep 3, 2025, at 07:56, bob prohaska <fbsd@www.zefox.net> wrote:
> 
> > On Tue, Sep 02, 2025 at 12:30:29PM -0700, Mark Millard wrote:
> >> 
> >> You might be able to see some of what is going on via
> >> a command like:
> >> 
> >> # kldxref -d /boot/kernel/ /boot/modules/ | grep -e kernel -e modules/ | less
> > 
> > Running that command to a file and searching for uftdi and filemon finds
> > 
> > 
> > /boot/kernel/filemon.ko
> >  depends on kernel.1500061 (1500061,1500061)
> > 
> > and 
> > /boot/kernel/uftdi.ko
> >  depends on kernel.1500061 (1500061,1500061)
> 
> Are you saying that most /boot/kernel/*.ko listed:
> 
> kernel.1500063 (1500063,1500063)
> 
> and that just those 2 did not? Vs.: that all
> /boot/kernel/*.ko listed:
> 
> kernel.1500061 (1500061,1500061)
> 
> ? Some other mix of some one way and others
> the other way?
> 
Most list  depends on kernel.1500061 (1500061,1500061), perhaps
10% list 63 and one lists some of each (!)
/boot/kernel/usb.ko
  depends on kernel.1500061 (1500061,1500061)
  depends on kernel.1500061 (1500061,1500061)
  depends on kernel.1500061 (1500061,1500061)
  depends on kernel.1500061 (1500061,1500061)
  depends on kernel.1500061 (1500061,1500061)
  depends on kernel.1500061 (1500061,1500061)
  depends on kernel.1500061 (1500061,1500061)
  depends on kernel.1500061 (1500061,1500061)
  depends on kernel.1500063 (1500063,1500063)
  depends on kernel.1500063 (1500063,1500063)
The above looks wrong to me.....

> Going in a different direction, I'll remind:
> 
>      The environment of make(1) for the build can be controlled via the
>      SRC_ENV_CONF variable, which defaults to /etc/src-env.conf.  Some
>      examples that may only be set in this file are WITH_DIRDEPS_BUILD, and
>      WITH_META_MODE, and MAKEOBJDIRPREFIX as they are environment-only
>      variables.
> 
> I'll note that I use a env prefix before a make
> command (in a script):
> 
> env __MAKE_CONF="/usr/home/root/src.configs/make.conf" \
> SRCCONF="/dev/null" SRC_ENV_CONF="/usr/home/root/src.configs/src.conf.aarch64-nodbg-clang.aarch64-host" \
> MAKEOBJDIRPREFIX="/usr/obj/BUILDs/main-aarch64-usr_src-nodbg-clang" \
> WITH_META_MODE=yes \
> time -l make $*
> 
> You reference elsewhere using: -DWITH_META_MODE
> That looks wrong to me. For make:
> 
>      -D variable
>              Define variable to be 1, in the global scope.
> 
> is defining a make variable, not an environment variable.
> I'm not saying that you should use -e but, as evidence,
> note the wording:
> 
>      -e      Let environment variables override global variables within
>              makefiles.
> 
> So: global variables are not environment variables.
> 
> As far as I can tell, you have not been using META_MODE
> based on what you report doing.
> 
> Absent META_MODE use, the system's recent change to using
> WITHOUT_CLEAN by default for buildworld and buildkernel
> can lead to more lack of updates of files that should be
> updated.
> 
> > At the same time, I see
> > # uname -K
> > 1500063
> > # uname -U
> > 1500061
> 
> -U output is a separate issue from the kernel
> and module version requirement mismatches as
> far as I can tell.
> 
> uname gets the -U putput text via:
> 
> static void                                      
> native_uservers(void)
> {                                                
>         static char buf[128];
>         
>         snprintf(buf, sizeof(buf), "%d", __FreeBSD_version);
>         uservers = buf;
> }
> 
> Recently WITHOUT_CLEAN became the default. So
> its status for your build depends on the timing
> of the commit that you used.
> 
> A WITHOUT_CLEAN (non-META_MODE) build might not
> have rebuilt uname. What does the following report:
> 
> # file /usr/obj/usr/src/arm.armv7/usr.bin/uname/uname
> 
> My paths are different, but here is an example (I've not
> built for armv7 in a long time):
> 
> # file /usr/obj/BUILDs/main-CA7-nodbg-clang/usr/main-src/arm.armv7/usr.bin/uname/uname
> /usr/obj/BUILDs/main-CA7-nodbg-clang/usr/main-src/arm.armv7/usr.bin/uname/uname: ELF 32-bit LSB executable, ARM, EABI5 version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, FreeBSD-style, for FreeBSD 15.0 (1500048), not stripped
> 
> Note that "(1500048)". Which number does your build
> of uname show for what is in your build tree (not
> what is installed)?
> 
# file /usr/obj/usr/src/arm.armv7/usr.bin/uname/uname
/usr/obj/usr/src/arm.armv7/usr.bin/uname/uname: ELF 32-bit LSB executable, ARM, EABI5 version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, FreeBSD-style, for FreeBSD 15.0 (1500063), not stripped
#

> > How can the kernel and modules get out of sync?
> 
> WITHOUT_CLEAN (recently by default) use without META_MODE
> to cause updates based on better dependency information.
> 
> > Even then, /usr/src was updated  followed by an immediate
> > buildworld/buildkernel.
> 
> You do not mention installations explicitly, just
> builds. Was the world installed too?
Yes, sorry for the ambiguity.

> 
> > In that case, how did world and kernel
> > become mismatched and remain so for a week or more?
> 
> See above about correctly providing META_MODE and
> about the recent change to have WITHOUT_CLEAN by
> default (less rebuilds by default).
> 
> > The suggestion to delete /usr/obj is looking unavoidable, are
> > additional steps warranted?
> 
> Given that you seem to have been doing doing
> builds where META_MODE was not helping to
> track dependencies, I'd recommend a from-scratch
> META_MODE based build to get the tracking
> information fully in place for use in later
> builds.
> 
> If you are to do META_MODE builds, builds that
> disable META_MODE mess up its dependency tracking
> information by not updating it. Thus, only if
> you stop using META_MODE overall should to avoid
> using META_MODE for a specific build.
>

When filemon stopped loading I ran a few build/install 
cycles without meta mode. I thought that would be harmless, 
but maybe not. Right now meta mode is running with the 
NO_FILEMON option.

I'll let it run till it either finishes or crashes, then
decide whether to remove /usr/obj after rebooting.

One possible culprit is of course me: From time to time
the machine seemingly stalls, with no response to keyboard
nor debugger escape. I've routinely power-cycles the machine
to recover, confident that internal checks would catch any
inconsistencies that might be introduced by ungraceful
shutdown. Might my confidence be misplaced? 

Thanks for writing!

bob prohaska