Re: CURRENT: havock: elf_load_section: truncated ELF file
- In reply to: Warner Losh : "Re: CURRENT: havock: elf_load_section: truncated ELF file"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 22 Dec 2025 19:23:13 UTC
Am Tage des Herren Sat, 20 Dec 2025 23:11:37 -0700 Warner Losh <imp@bsdimp.com> schrieb: > On Sat, Dec 20, 2025 at 3:31 PM A FreeBSD User <freebsd@walstatt-de.de> > wrote: > > > Am Tage des Herren Sat, 20 Dec 2025 08:10:59 -0700 > > Warner Losh <imp@bsdimp.com> schrieb: > > > > > On Sat, Dec 20, 2025 at 6:12 AM A FreeBSD User <freebsd@walstatt-de.de> > > > wrote: > > > > > > > Hello, > > > > > > > > recently a small server running recent CURRENT with a UFS basesd system > > > > SSD (NVMe) and a data > > > > graveyard based on RAID level 5 with ZFS (attached to a Fujitsu HBA > > > > controler) gets corrupted > > > > because of "loosing" a driver - this time the system reported TWO > > drives a > > > > removed froma RAID > > > > level 5 - which is like a death sentence. > > > > > > > > I guess this is a fallout of the recently changed timie parameters to > > the > > > > CAM infrastructure > > > > (I can't find any notes on this in man cam, so I feel lost). > > > > > > > > > > Unlikely, but you can set this in the boot loader: > > > kern.cam.tur_timeout=60 > > > kern.cam.inquiry_timeout=60 > > > kern.cam.modesense_timeout=60 > > > > I'll check, thanks. Are these OIDs documented somewhere to be at hand just > > in case? I searched > > the recent cam manpage ... > > > > scsi.4: > SYSCTL VARIABLES > The following variables are available as both sysctl(8) variables and > loader(8) tunables: > > kern.cam.cam_srch_hi > Search above LUN 7 for SCSI3 and greater devices. > > kern.cam.tur_timeout > Timeout, in ms, for the initial TESTUNITREADY command we send to > the > devices during their initial probing. Defaults to 1s. FreeBSD 15 > and earlier set this to 60s. > > kern.cam.inquiry_timeout > Timeout, in ms, for the initial INQUIRY command we send to the > devices during their initial probing. Defaults to 1s. FreeBSD 15 > and earlier set this to 60s. > > kern.cam.reportluns_timeout > Timeout, in ms, for the initial REPORTLUNS command we send to the > devices during their initial probing. Defaults to 50s. > > kern.cam.modesense_timeout > Timeout, in ms, for the initial MODESENSE command we send to the > devices during their initial probing. Defaults to 1s. FreeBSD 15 > and earlier set this to 60s. Oh, I see, thank you for the hint. oh > > > > > > > > and see if that works. You should see new errors on boot if his is the > > > issue. Can you share a dmesg? > > > > > > I kinda doubt they'd cause the issues that you've had. If disks are gone, > > > then there'd be different errors to what you are seeing, I'd think. > > > > > > To recover, your best bet is to use a USB stick from one of the release > > or > > > snapshots. > > > > In earlier times, when "make installkernel and/or make installworld > > crashed midair, some > > binaries in the installed tree were corrupted and since I run CURRENT > > which has a tough pace > > at the moment, the USB image booting should be close to the CURRENT made > > via "make world" ... > > I assume. I did so and had some problems with the new pkg concept ... > > (working offline, is a > > problem with the install-blob.txz ...) > > > > Yuck. Sorry that was a source of trouble for you. > > > > > > > > Warner > > > > > > > > > > A very desastrous side effect of this crash was the inability to reboot > > > > the box (CURRENT pre- > > > > 16.0-CURRENT #11 master-n282659-7f39d05b67ae: Sat Dec 20 09:35:32 CET > > > > 2025amd64, the runtime > > > > system was from 16th or 17th of December). > > > > After several tenth of minutes I had to hadr reboot the box - with > > obvious > > > > data loss on the > > > > system SSD. And here my problems start to turn into a mess. > > > > > > > > After the first initial reboot I performed a fsck -fy, rebootet and > > > > whitnessed that > > > > jails didn't come up anymore and SSHD didn't work. So I installed > > prior to > > > > the crash already > > > > compiled CURRENT from /usr/src which is "master-n282659-7f39d05b67ae" > > (as > > > > the sibling box which > > > > is runnig great by the way, but different CPU and smaller RAID, but > > also > > > > system SSD based on > > > > UFS filesystem, same HBA. So CURRENT seem to operate in general on > > similar > > > > hardware. > > > > > > > > After the second reboot with the old kernel the box in question went > > into > > > > debugger, rebooting > > > > in single user mode and performing fsck -fy revealed a lot of repairs > > on > > > > the first partitions, > > > > /, /var, /usr. After a reboot I realized that most services now are > > broken > > > > - jails do not > > > > start, sshd doesn't start and the whole system is going into multiuser, > > > > but seems to have > > > > serious problems. > > > > > > > > uname -a remains empty > > > > cd /usr/src; make buildworld returns immediately empty, no further > > action > > > > service ldconfig start also returns complete empty on console > > > > > > > > Several onboard/base tools simply return nothing. > > > > > > > > trying "/resucue/sh" (install date indicates 20th of December, so it is > > > > the latest ) seems to > > > > give me the first indication of something has terribly gone wrong or > > even > > > > /rescue/vi (to edit > > > > loader to change to boot.old): > > > > > > > > elf_load_section: truncated ELF file > > > > Abort trap > > > > > > > > Checking /boot/kernel, /lib, /usr/lib, /bin or /sbin seems to be intakt > > > > (as far as I can > > > > check, all timestamps are 20th Dec 2025, 9:48 UTC). > > > > > > > > Well, since this is not the first time I ran into some problems using > > > > CURRENT, the outage due > > > > to two lost ZFS drives after the recent chenges seems worthy to make > > some > > > > note here. > > > > > > > > > > Can you provide error messages at boot for this? You talk about fsck and > > > about ZFS, so I'm a little confused as to your setup. > > > > No need to be confused: the CURRENT crashed/froze after two of five HDD > > were reported as > > "removed" from a RAIDZ pool. The box hung forever. > > > > The OS resides on a SSD with UFS. After > 30 min I had to switch off/on > > the box physically. > > So the UFS filesystem had a bump (journalling didn't fix it). ZFS "healed" > > after reboot and > > checking the HDD. UFS SSD didn't ... > > > > > > I spent a while now to bring back everything. Boot device is now ZFS, too. > > And, therefore, > > obvious slower but somehow save. > > > > The only issue I have now is a crash after a reboot. While rebooting and > > killing jails, the > > box drops into kernel debugger ... > > > > Somehow I need to copy the picture I made from the box, since the machine > > isn't connected to > > the net at the moment ... > > > > > > > > > > > > Warner > > > > > > > > > > The other question would be how to fix: one strategy would be to boot > > from > > > > an official image > > > > from flash drive and try to perform a "make installkernel > > installworld". > > > > Maybe there is > > > > another way idicativ to that what I described above ... > > > > > > > > > > > > > > > > > > > > Thanks in advance, > > > > > > > > oh > > > > > > > > > > > > -- > > > > > > > > A FreeBSD user > > > > > > > > > > > > -- > > > > A FreeBSD user > > -- A FreeBSD user