Re: CURRENT: havock: elf_load_section: truncated ELF file

From: Warner Losh <imp_at_bsdimp.com>
Date: Sun, 21 Dec 2025 06:11:37 UTC
On Sat, Dec 20, 2025 at 3:31 PM A FreeBSD User <freebsd@walstatt-de.de>
wrote:

> Am Tage des Herren Sat, 20 Dec 2025 08:10:59 -0700
> Warner Losh <imp@bsdimp.com> schrieb:
>
> > On Sat, Dec 20, 2025 at 6:12 AM A FreeBSD User <freebsd@walstatt-de.de>
> > wrote:
> >
> > > Hello,
> > >
> > > recently a small server running recent CURRENT with a UFS basesd system
> > > SSD (NVMe) and a data
> > > graveyard based on RAID level 5 with ZFS (attached to a Fujitsu HBA
> > > controler) gets corrupted
> > > because of "loosing" a driver - this time the system reported TWO
> drives a
> > > removed froma RAID
> > > level 5 - which is like a death sentence.
> > >
> > > I guess this is a fallout of the recently changed timie parameters to
> the
> > > CAM infrastructure
> > > (I can't find any notes on this in man cam, so I feel lost).
> > >
> >
> > Unlikely, but you can set this in the boot loader:
> > kern.cam.tur_timeout=60
> > kern.cam.inquiry_timeout=60
> > kern.cam.modesense_timeout=60
>
> I'll check, thanks. Are these OIDs documented somewhere to be at hand just
> in case? I searched
> the recent cam manpage ...
>

scsi.4:
SYSCTL VARIABLES
     The following variables are available as both sysctl(8) variables and
     loader(8) tunables:

     kern.cam.cam_srch_hi
         Search above LUN 7 for SCSI3 and greater devices.

     kern.cam.tur_timeout
         Timeout, in ms, for the initial TESTUNITREADY command we send to
the
         devices during their initial probing.  Defaults to 1s.  FreeBSD 15
         and earlier set this to 60s.

     kern.cam.inquiry_timeout
         Timeout, in ms, for the initial INQUIRY command we send to the
         devices during their initial probing.  Defaults to 1s.  FreeBSD 15
         and earlier set this to 60s.

     kern.cam.reportluns_timeout
         Timeout, in ms, for the initial REPORTLUNS command we send to the
         devices during their initial probing.  Defaults to 50s.

     kern.cam.modesense_timeout
         Timeout, in ms, for the initial MODESENSE command we send to the
         devices during their initial probing.  Defaults to 1s.  FreeBSD 15
         and earlier set this to 60s.


> >
> > and see if that works.  You should see new errors on boot if his is the
> > issue. Can you share a dmesg?
> >
> > I kinda doubt they'd cause the issues that you've had. If disks are gone,
> > then there'd be different errors to what you are seeing, I'd think.
> >
> > To recover, your best bet is to use a USB stick from one of the release
> or
> > snapshots.
>
> In earlier times, when "make installkernel and/or make installworld
> crashed midair, some
> binaries in the installed tree were corrupted and since I run CURRENT
> which has a tough pace
> at the moment, the USB image booting should be close to the CURRENT made
> via "make world" ...
> I assume. I did so and had some problems with the new pkg concept ...
> (working offline, is a
> problem with the install-blob.txz ...)
>

Yuck. Sorry that was a source of trouble for you.


> >
> > Warner
> >
> >
> > > A very desastrous side effect of this crash was the inability to reboot
> > > the box (CURRENT pre-
> > > 16.0-CURRENT #11 master-n282659-7f39d05b67ae: Sat Dec 20 09:35:32 CET
> > > 2025amd64, the runtime
> > > system was from 16th or 17th of December).
> > > After several tenth of minutes I had to hadr reboot the box - with
> obvious
> > > data loss on the
> > > system SSD. And here my problems start to turn into a mess.
> > >
> > > After the first initial reboot I performed a fsck -fy, rebootet and
> > > whitnessed that
> > > jails didn't come up anymore and SSHD didn't work. So I installed
> prior to
> > > the crash already
> > > compiled CURRENT from /usr/src which is "master-n282659-7f39d05b67ae"
> (as
> > > the sibling box which
> > > is runnig great by the way, but different CPU and smaller RAID, but
> also
> > > system SSD based on
> > > UFS filesystem, same HBA. So CURRENT seem to operate in general on
> similar
> > > hardware.
> > >
> > > After the second reboot with the old kernel the box in question went
> into
> > > debugger, rebooting
> > > in single user mode and performing fsck -fy revealed a lot of repairs
> on
> > > the first partitions,
> > > /, /var, /usr. After a reboot I realized that most services now are
> broken
> > > - jails do not
> > > start, sshd doesn't start and the whole system is going into multiuser,
> > > but seems to have
> > > serious problems.
> > >
> > > uname -a remains empty
> > > cd /usr/src; make buildworld returns immediately empty, no further
> action
> > > service ldconfig start also returns complete empty on console
> > >
> > > Several onboard/base tools simply return nothing.
> > >
> > > trying "/resucue/sh" (install date indicates 20th of December, so it is
> > > the latest ) seems to
> > > give me the first indication of something has terribly gone wrong or
> even
> > > /rescue/vi (to edit
> > > loader to change to boot.old):
> > >
> > > elf_load_section: truncated ELF file
> > > Abort trap
> > >
> > > Checking /boot/kernel, /lib, /usr/lib, /bin or /sbin seems to be intakt
> > > (as far as I can
> > > check, all timestamps are 20th Dec 2025, 9:48 UTC).
> > >
> > > Well, since this is not the first time I ran into some problems using
> > > CURRENT, the outage due
> > > to two lost ZFS drives after the recent chenges seems worthy to make
> some
> > > note here.
> > >
> >
> > Can you provide error messages at boot for this? You talk about fsck and
> > about ZFS, so I'm a little confused as to your setup.
>
> No need to be confused: the CURRENT crashed/froze after two of five HDD
> were reported as
> "removed" from a RAIDZ pool. The box hung forever.
>
> The OS  resides on a SSD with UFS. After > 30 min I had to switch off/on
> the box physically.
> So the UFS filesystem had a bump (journalling didn't fix it). ZFS "healed"
> after reboot and
> checking the HDD. UFS SSD didn't ...
>
>
> I spent a while now to bring back everything. Boot device is now ZFS, too.
> And, therefore,
> obvious slower but somehow save.
>
> The only issue I have now is a crash after a reboot. While rebooting and
> killing jails, the
> box drops into kernel debugger ...
>
> Somehow I need to copy the picture I made from the box, since the machine
> isn't connected to
> the net at the moment ...
>




> >
> > Warner
> >
> >
> > > The other question would be how to fix: one strategy would be to boot
> from
> > > an official image
> > > from flash drive and try to perform a "make installkernel
> installworld".
> > > Maybe there is
> > > another way idicativ to that what I described above ...
> > >
> >
> >
> >
> >
> > > Thanks in advance,
> > >
> > > oh
> > >
> > >
> > > --
> > >
> > > A FreeBSD user
> > >
>
>
>
> --
>
> A FreeBSD user
>