Re: "options MAXMEMDOM=2" vs. amd64 DBG kernel booting: 3000+ "kernel: Process (pid 1) got signal 5" notices during booting

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 02 Dec 2023 04:18:20 UTC
On Nov 23, 2023, at 13:28, Mark Millard <marklmi@yahoo.com> wrote:

> On Nov 21, 2023, at 21:43, Mark Millard <marklmi@yahoo.com> wrote:
> 
>> While my kernel/world build procedures build both DBG and NODBG
>> kernels and worlds, I normally run the NODBG kernel and world,
>> using DBG only when I need to for problem investigation.
>> 
>> I recently had reason to use the DBG kernel and found it got the
>> oddity of 3000+ instances of "kernel: Process (pid 1) got signal 5"
>> during booting, as reported in /var/log/messages . An example is:
>> 
>> . . .
>> Nov 20 23:13:09 7950X3D-UFS shutdown[20174]: reboot by root: 
>> Nov 20 23:13:09 7950X3D-UFS syslogd: exiting on signal 15
>> Nov 20 23:14:21 7950X3D-UFS syslogd: kernel boot file is /boot/kernel/kernel

What looks to normally output just before the odd messages below
is the likes of:

. . .
ugen1.3: <AsusTek Computer Inc. AURA LED Controller> at usbus1
ugen1.4: <Corsair CORSAIR iCUE COMMANDER Core> at usbus1

The odd messages are reported in /various/log/messages as:

>> Nov 20 23:14:21 7950X3D-UFS kernel: got signal 5
>> Nov 20 23:14:21 7950X3D-UFS kernel: Process (pid 1) got signal 5
>> Nov 20 23:14:21 7950X3D-UFS syslogd: last message repeated 3133 times

The text of what would normally be in the output here
is the likes of:

Root mount waiting for: CAM
. . .
Root mount waiting for: CAM
nda0 at nvme0 bus 0 scbus4 target 0 lun 1
nda0: <Samsung SSD 970 EVO Plus 2TB 2B2QEXM7 S59CNM0W518941Y>
nda0: Serial Number REDACTED
nda0: nvme version 1.3
nda0: 1907729MB (3907029168 512 byte sectors)
nda1 at nvme1 bus 0 scbus5 target 0 lun 1
nda1: <INTEL SSDPE21D015TA E2010480 PHKE150100MV1P5CGN>
nda1: Serial Number REDACTED
nda1: nvme version 1.0
nda1: 1430799MB (2930277168 512 byte sectors)
nda2 at nvme2 bus 0 scbus6 target 0 lun 1
nda2: <INTEL SSDPED1D015TAY E2010603 PHMB934100211P5FGN>
nda2: Serial Number REDACTED
nda2: nvme version 1.0
nda2: 1430799MB (2930277168 512 byte sectors)
nda3 at nvme3 bus 0 scbus7 target 0 lun 1
nda3: <Samsung SSD 960 PRO 2TB 4B6QCXP7 S3EXNX0J502345D>
nda3: Serial Number REDACTED
nda3: nvme version 1.2
nda3: 1953514MB (4000797360 512 byte sectors)
nda4 at nvme4 bus 0 scbus8 target 0 lun 1
nda4: <Samsung SSD 960 PRO 1TB 4B6QCXP7 S3EVNWAJ300110H>
nda4: Serial Number REDACTED
nda4: nvme version 1.2
nda4: 976762MB (2000409264 512 byte sectors)
Setting hostuuid: REDACTED.
Setting hostid: REDACTED.
Starting file system checks:
/dev/gpt/FBSDFSSDroot: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/gpt/FBSDFSSDroot: clean, 221370202 free (400602 frags, 27621200 blocks, 0.1% fragmentation)
Mounting local filesystems:.
Autoloading module: acpi_wmi
Autoloading module: ig4
Autoloading module: intpm

But I do not get to see such when the 3000+ messages
happen.

>> Nov 20 23:14:21 7950X3D-UFS kernel: intsmb0: <AMD FCH SMBus Controller> at device 20.0 on pci0
>> . . .
>> 
>> This stopped when I commented out the:
>> 
>> options        MAXMEMDOM=2

The 3000+ messages returned, no MAXMEMDOM assignment present. I'd
updated FreeBSD and replaced the 96 GiBytes of RAM with 192 GiBytes
of RAM.

"boot -s" and "boot -v" still get the messages.

In hopes for recording the messages to see what is last before
the messages start (no serial console) I tried adding
"kern.msgbufsize=4587520" to /boot/loader.conf . The result was
no "kernel: Process (pid 1) got signal 5" messages at all for
the boot of the debug kernel+world.

That and the earlier MAXMEMDOM change leading to the behavior
changes suggest memory layout sensitivity.

I've commented out "kern.msgbufsize=4587520" for testing in a
context where the 3000+ messages occur.


>> that I've had historically and built, installed, and booted
>> the resulting DBG kernel.
>> 
>> I'll note that I never had the messages for the NODDBG kernel,
>> despite it also having that line.
>> . . .

I've still not seen such from a non-debug kernel+world.

I'll be doing a from scratch "bulk -a" under the debug kernel and
world context as part of testing: https://reviews.freebsd.org/D42767
This should also give an idea if the context is unstable.

===
Mark Millard
marklmi at yahoo.com