Re: CURRENT: havock: elf_load_section: truncated ELF file

In reply to: Warner Losh : "Re: CURRENT: havock: elf_load_section: truncated ELF file"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: A FreeBSD User <freebsd_at_walstatt-de.de>
Date: Mon, 22 Dec 2025 19:23:13 UTC
Am Tage des Herren Sat, 20 Dec 2025 23:11:37 -0700
Warner Losh <imp@bsdimp.com> schrieb:

> On Sat, Dec 20, 2025 at 3:31 PM A FreeBSD User <freebsd@walstatt-de.de>
> wrote:
> 
> > Am Tage des Herren Sat, 20 Dec 2025 08:10:59 -0700
> > Warner Losh <imp@bsdimp.com> schrieb:
> >  
> > > On Sat, Dec 20, 2025 at 6:12 AM A FreeBSD User <freebsd@walstatt-de.de>
> > > wrote:
> > >  
> > > > Hello,
> > > >
> > > > recently a small server running recent CURRENT with a UFS basesd system
> > > > SSD (NVMe) and a data
> > > > graveyard based on RAID level 5 with ZFS (attached to a Fujitsu HBA
> > > > controler) gets corrupted
> > > > because of "loosing" a driver - this time the system reported TWO  
> > drives a  
> > > > removed froma RAID
> > > > level 5 - which is like a death sentence.
> > > >
> > > > I guess this is a fallout of the recently changed timie parameters to  
> > the  
> > > > CAM infrastructure
> > > > (I can't find any notes on this in man cam, so I feel lost).
> > > >  
> > >
> > > Unlikely, but you can set this in the boot loader:
> > > kern.cam.tur_timeout=60
> > > kern.cam.inquiry_timeout=60
> > > kern.cam.modesense_timeout=60  
> >
> > I'll check, thanks. Are these OIDs documented somewhere to be at hand just
> > in case? I searched
> > the recent cam manpage ...
> >  
> 
> scsi.4:
> SYSCTL VARIABLES
>      The following variables are available as both sysctl(8) variables and
>      loader(8) tunables:
> 
>      kern.cam.cam_srch_hi
>          Search above LUN 7 for SCSI3 and greater devices.
> 
>      kern.cam.tur_timeout
>          Timeout, in ms, for the initial TESTUNITREADY command we send to
> the
>          devices during their initial probing.  Defaults to 1s.  FreeBSD 15
>          and earlier set this to 60s.
> 
>      kern.cam.inquiry_timeout
>          Timeout, in ms, for the initial INQUIRY command we send to the
>          devices during their initial probing.  Defaults to 1s.  FreeBSD 15
>          and earlier set this to 60s.
> 
>      kern.cam.reportluns_timeout
>          Timeout, in ms, for the initial REPORTLUNS command we send to the
>          devices during their initial probing.  Defaults to 50s.
> 
>      kern.cam.modesense_timeout
>          Timeout, in ms, for the initial MODESENSE command we send to the
>          devices during their initial probing.  Defaults to 1s.  FreeBSD 15
>          and earlier set this to 60s.

Oh, I see, thank you for the hint.

oh

> 
> 
> > >
> > > and see if that works.  You should see new errors on boot if his is the
> > > issue. Can you share a dmesg?
> > >
> > > I kinda doubt they'd cause the issues that you've had. If disks are gone,
> > > then there'd be different errors to what you are seeing, I'd think.
> > >
> > > To recover, your best bet is to use a USB stick from one of the release  
> > or  
> > > snapshots.  
> >
> > In earlier times, when "make installkernel and/or make installworld
> > crashed midair, some
> > binaries in the installed tree were corrupted and since I run CURRENT
> > which has a tough pace
> > at the moment, the USB image booting should be close to the CURRENT made
> > via "make world" ...
> > I assume. I did so and had some problems with the new pkg concept ...
> > (working offline, is a
> > problem with the install-blob.txz ...)
> >  
> 
> Yuck. Sorry that was a source of trouble for you.
> 
> 
> > >
> > > Warner
> > >
> > >  
> > > > A very desastrous side effect of this crash was the inability to reboot
> > > > the box (CURRENT pre-
> > > > 16.0-CURRENT #11 master-n282659-7f39d05b67ae: Sat Dec 20 09:35:32 CET
> > > > 2025amd64, the runtime
> > > > system was from 16th or 17th of December).
> > > > After several tenth of minutes I had to hadr reboot the box - with  
> > obvious  
> > > > data loss on the
> > > > system SSD. And here my problems start to turn into a mess.
> > > >
> > > > After the first initial reboot I performed a fsck -fy, rebootet and
> > > > whitnessed that
> > > > jails didn't come up anymore and SSHD didn't work. So I installed  
> > prior to  
> > > > the crash already
> > > > compiled CURRENT from /usr/src which is "master-n282659-7f39d05b67ae"  
> > (as  
> > > > the sibling box which
> > > > is runnig great by the way, but different CPU and smaller RAID, but  
> > also  
> > > > system SSD based on
> > > > UFS filesystem, same HBA. So CURRENT seem to operate in general on  
> > similar  
> > > > hardware.
> > > >
> > > > After the second reboot with the old kernel the box in question went  
> > into  
> > > > debugger, rebooting
> > > > in single user mode and performing fsck -fy revealed a lot of repairs  
> > on  
> > > > the first partitions,
> > > > /, /var, /usr. After a reboot I realized that most services now are  
> > broken  
> > > > - jails do not
> > > > start, sshd doesn't start and the whole system is going into multiuser,
> > > > but seems to have
> > > > serious problems.
> > > >
> > > > uname -a remains empty
> > > > cd /usr/src; make buildworld returns immediately empty, no further  
> > action  
> > > > service ldconfig start also returns complete empty on console
> > > >
> > > > Several onboard/base tools simply return nothing.
> > > >
> > > > trying "/resucue/sh" (install date indicates 20th of December, so it is
> > > > the latest ) seems to
> > > > give me the first indication of something has terribly gone wrong or  
> > even  
> > > > /rescue/vi (to edit
> > > > loader to change to boot.old):
> > > >
> > > > elf_load_section: truncated ELF file
> > > > Abort trap
> > > >
> > > > Checking /boot/kernel, /lib, /usr/lib, /bin or /sbin seems to be intakt
> > > > (as far as I can
> > > > check, all timestamps are 20th Dec 2025, 9:48 UTC).
> > > >
> > > > Well, since this is not the first time I ran into some problems using
> > > > CURRENT, the outage due
> > > > to two lost ZFS drives after the recent chenges seems worthy to make  
> > some  
> > > > note here.
> > > >  
> > >
> > > Can you provide error messages at boot for this? You talk about fsck and
> > > about ZFS, so I'm a little confused as to your setup.  
> >
> > No need to be confused: the CURRENT crashed/froze after two of five HDD
> > were reported as
> > "removed" from a RAIDZ pool. The box hung forever.
> >
> > The OS  resides on a SSD with UFS. After > 30 min I had to switch off/on
> > the box physically.
> > So the UFS filesystem had a bump (journalling didn't fix it). ZFS "healed"
> > after reboot and
> > checking the HDD. UFS SSD didn't ...
> >
> >
> > I spent a while now to bring back everything. Boot device is now ZFS, too.
> > And, therefore,
> > obvious slower but somehow save.
> >
> > The only issue I have now is a crash after a reboot. While rebooting and
> > killing jails, the
> > box drops into kernel debugger ...
> >
> > Somehow I need to copy the picture I made from the box, since the machine
> > isn't connected to
> > the net at the moment ...
> >  
> 
> 
> 
> 
> > >
> > > Warner
> > >
> > >  
> > > > The other question would be how to fix: one strategy would be to boot  
> > from  
> > > > an official image
> > > > from flash drive and try to perform a "make installkernel  
> > installworld".  
> > > > Maybe there is
> > > > another way idicativ to that what I described above ...
> > > >  
> > >
> > >
> > >
> > >  
> > > > Thanks in advance,
> > > >
> > > > oh
> > > >
> > > >
> > > > --
> > > >
> > > > A FreeBSD user
> > > >  
> >
> >
> >
> > --
> >
> > A FreeBSD user
> >  



-- 

A FreeBSD user