From nobody Sun Dec 21 06:11:37 2025 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4dYrWv3jlVz6LknG for ; Sun, 21 Dec 2025 06:11:55 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4dYrWv1s0wz46bP for ; Sun, 21 Dec 2025 06:11:55 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pj1-x1035.google.com with SMTP id 98e67ed59e1d1-34b75fba315so3495931a91.3 for ; Sat, 20 Dec 2025 22:11:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1766297508; x=1766902308; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=32FyCeGGdYSbZ5BVGrI2mPIgX5Tnte+66qbTE7YZHMg=; b=MoJV5zsf6M02rveLXHA8NgmCg0Vb88bpTWUjnncup3CpUg2WvCDx2jxoUvolq8S4GA syXvk+vQJpPC0fJPmtQtFlMMlhRLytrRV+YOYQ7KuesNa/XiGDD9CbMK93pWQQFTEJlE j5jeJnniJhXQnPB3q1CsLCxV11GQeCtNSBmUzTYjMe56Uzc8MSLWonv/Ll/KwVIJeJuc qiaRA7nnty7YHzE49nj1yV3jaHPRgXAlWax85aaNyUMf7JjcA2qK+xIcVl8QCcykJCRG xrD4TyfmaNBkEi8eKg1XvJwF6qSX88oIdLUPUSoZRSCP4czkMEShLr87OG/7hPHTjTh5 mwww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766297508; x=1766902308; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=32FyCeGGdYSbZ5BVGrI2mPIgX5Tnte+66qbTE7YZHMg=; b=GCFP+B7O9rXLr2PO0+MeaikV3/uY1gfiIMHhsA+LnAfxASqopbF6MBbIC342xbMPKz 2NUBndBaI1r4IXNyYcHawsCqAHQc86/LppHtZqb4C5INc4g8Z6DMFa0uGt0XP+FwSJw2 mNTRJxyH5yv+ipnRErIlKnWS+ZwTOeAqZ2sxQLfMPDtiM/g2JovNyvZXrsdtvKCf8Vhv LcRl0S+bIGry/6asXGuZfomahF6FaV/Mdl/8GREQW+dKICFVUHt9tn/4Wmkd1dYrVykx PNOIwk1BuKtAafTMfGtyRFy5haiV0cm+QXIyzTsl7np9mtdWqq376LE0ZtTC0f+uTT// JE0w== X-Gm-Message-State: AOJu0YymIgKR3Lpkoj7GftNlKDruRbhU8Ykyokt9hn2czkUhCprp3noQ OEWAwaW4TpT8eiJ0WWb4VplHLoF19vmLZ8ZB+Omxl6hZblPEhGHCxMw9aGV9B7GaLIF/ZhYlIJi 4hz5T3xK+XMO898orrmOCNmsxc8ghkXtXdp/fDTv6CXB6YAjp5aa92W5ZVQ== X-Gm-Gg: AY/fxX4YPqvlP29lWDvbz1x4naI1PH9VnvU/l6+SG3DHibD25HwUkBbM5/vmJ5Y9OxJ t99YFQl1LN9Byl2dlZMfAJphoaQavv3qTfLMdmfBggI/+p/HkssMvuhvVLpxVz9+ArX6Moa60CA sa6DjazSPKOikt1mWdqmMsB+jci7nyQsh46Zi5rakcfiZ3cfttXD6IyE9vpdd2+4ykznNBxxHYq xe1ABCKME3XhORvDmMqzga2Z8D3NIPUi/H38OjrKe02gow38wbtLGYJfZLDdE2oaBikAjM= X-Google-Smtp-Source: AGHT+IGuuTAHvSgSdZYGMl5+kENIlqmQeGDo7b8ddj9rpTYdONI7wM9Km8U1f0G3qtQJvy8D+5mFNmut/ESHmbjnh2s= X-Received: by 2002:a17:90b:28ce:b0:33b:cfac:d5c6 with SMTP id 98e67ed59e1d1-34e921d19b6mr5878298a91.29.1766297508257; Sat, 20 Dec 2025 22:11:48 -0800 (PST) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 References: <20251220141124.1606aa7c@thor.sb211.local> <20251220233127.2ad04793@thor.sb211.local> In-Reply-To: <20251220233127.2ad04793@thor.sb211.local> From: Warner Losh Date: Sat, 20 Dec 2025 23:11:37 -0700 X-Gm-Features: AQt7F2qkIAxYPQe7XFN__-70Cxv4h8Rk4Cknq56S4cHLTTeaQx0IQXr7GRWb5ME Message-ID: Subject: Re: CURRENT: havock: elf_load_section: truncated ELF file To: A FreeBSD User Cc: FreeBSD CURRENT Content-Type: multipart/alternative; boundary="0000000000008c34620646702f44" X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Queue-Id: 4dYrWv1s0wz46bP --0000000000008c34620646702f44 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Dec 20, 2025 at 3:31=E2=80=AFPM A FreeBSD User wrote: > Am Tage des Herren Sat, 20 Dec 2025 08:10:59 -0700 > Warner Losh schrieb: > > > On Sat, Dec 20, 2025 at 6:12=E2=80=AFAM A FreeBSD User > > wrote: > > > > > Hello, > > > > > > recently a small server running recent CURRENT with a UFS basesd syst= em > > > SSD (NVMe) and a data > > > graveyard based on RAID level 5 with ZFS (attached to a Fujitsu HBA > > > controler) gets corrupted > > > because of "loosing" a driver - this time the system reported TWO > drives a > > > removed froma RAID > > > level 5 - which is like a death sentence. > > > > > > I guess this is a fallout of the recently changed timie parameters to > the > > > CAM infrastructure > > > (I can't find any notes on this in man cam, so I feel lost). > > > > > > > Unlikely, but you can set this in the boot loader: > > kern.cam.tur_timeout=3D60 > > kern.cam.inquiry_timeout=3D60 > > kern.cam.modesense_timeout=3D60 > > I'll check, thanks. Are these OIDs documented somewhere to be at hand jus= t > in case? I searched > the recent cam manpage ... > scsi.4: SYSCTL VARIABLES The following variables are available as both sysctl(8) variables and loader(8) tunables: kern.cam.cam_srch_hi Search above LUN 7 for SCSI3 and greater devices. kern.cam.tur_timeout Timeout, in ms, for the initial TESTUNITREADY command we send to the devices during their initial probing. Defaults to 1s. FreeBSD 15 and earlier set this to 60s. kern.cam.inquiry_timeout Timeout, in ms, for the initial INQUIRY command we send to the devices during their initial probing. Defaults to 1s. FreeBSD 15 and earlier set this to 60s. kern.cam.reportluns_timeout Timeout, in ms, for the initial REPORTLUNS command we send to the devices during their initial probing. Defaults to 50s. kern.cam.modesense_timeout Timeout, in ms, for the initial MODESENSE command we send to the devices during their initial probing. Defaults to 1s. FreeBSD 15 and earlier set this to 60s. > > > > and see if that works. You should see new errors on boot if his is the > > issue. Can you share a dmesg? > > > > I kinda doubt they'd cause the issues that you've had. If disks are gon= e, > > then there'd be different errors to what you are seeing, I'd think. > > > > To recover, your best bet is to use a USB stick from one of the release > or > > snapshots. > > In earlier times, when "make installkernel and/or make installworld > crashed midair, some > binaries in the installed tree were corrupted and since I run CURRENT > which has a tough pace > at the moment, the USB image booting should be close to the CURRENT made > via "make world" ... > I assume. I did so and had some problems with the new pkg concept ... > (working offline, is a > problem with the install-blob.txz ...) > Yuck. Sorry that was a source of trouble for you. > > > > Warner > > > > > > > A very desastrous side effect of this crash was the inability to rebo= ot > > > the box (CURRENT pre- > > > 16.0-CURRENT #11 master-n282659-7f39d05b67ae: Sat Dec 20 09:35:32 CET > > > 2025amd64, the runtime > > > system was from 16th or 17th of December). > > > After several tenth of minutes I had to hadr reboot the box - with > obvious > > > data loss on the > > > system SSD. And here my problems start to turn into a mess. > > > > > > After the first initial reboot I performed a fsck -fy, rebootet and > > > whitnessed that > > > jails didn't come up anymore and SSHD didn't work. So I installed > prior to > > > the crash already > > > compiled CURRENT from /usr/src which is "master-n282659-7f39d05b67ae" > (as > > > the sibling box which > > > is runnig great by the way, but different CPU and smaller RAID, but > also > > > system SSD based on > > > UFS filesystem, same HBA. So CURRENT seem to operate in general on > similar > > > hardware. > > > > > > After the second reboot with the old kernel the box in question went > into > > > debugger, rebooting > > > in single user mode and performing fsck -fy revealed a lot of repairs > on > > > the first partitions, > > > /, /var, /usr. After a reboot I realized that most services now are > broken > > > - jails do not > > > start, sshd doesn't start and the whole system is going into multiuse= r, > > > but seems to have > > > serious problems. > > > > > > uname -a remains empty > > > cd /usr/src; make buildworld returns immediately empty, no further > action > > > service ldconfig start also returns complete empty on console > > > > > > Several onboard/base tools simply return nothing. > > > > > > trying "/resucue/sh" (install date indicates 20th of December, so it = is > > > the latest ) seems to > > > give me the first indication of something has terribly gone wrong or > even > > > /rescue/vi (to edit > > > loader to change to boot.old): > > > > > > elf_load_section: truncated ELF file > > > Abort trap > > > > > > Checking /boot/kernel, /lib, /usr/lib, /bin or /sbin seems to be inta= kt > > > (as far as I can > > > check, all timestamps are 20th Dec 2025, 9:48 UTC). > > > > > > Well, since this is not the first time I ran into some problems using > > > CURRENT, the outage due > > > to two lost ZFS drives after the recent chenges seems worthy to make > some > > > note here. > > > > > > > Can you provide error messages at boot for this? You talk about fsck an= d > > about ZFS, so I'm a little confused as to your setup. > > No need to be confused: the CURRENT crashed/froze after two of five HDD > were reported as > "removed" from a RAIDZ pool. The box hung forever. > > The OS resides on a SSD with UFS. After > 30 min I had to switch off/on > the box physically. > So the UFS filesystem had a bump (journalling didn't fix it). ZFS "healed= " > after reboot and > checking the HDD. UFS SSD didn't ... > > > I spent a while now to bring back everything. Boot device is now ZFS, too= . > And, therefore, > obvious slower but somehow save. > > The only issue I have now is a crash after a reboot. While rebooting and > killing jails, the > box drops into kernel debugger ... > > Somehow I need to copy the picture I made from the box, since the machine > isn't connected to > the net at the moment ... > > > > > Warner > > > > > > > The other question would be how to fix: one strategy would be to boot > from > > > an official image > > > from flash drive and try to perform a "make installkernel > installworld". > > > Maybe there is > > > another way idicativ to that what I described above ... > > > > > > > > > > > > > > Thanks in advance, > > > > > > oh > > > > > > > > > -- > > > > > > A FreeBSD user > > > > > > > -- > > A FreeBSD user > --0000000000008c34620646702f44 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Sat, Dec 20,= 2025 at 3:31=E2=80=AFPM A FreeBSD User <freebsd@walstatt-de.de> wrote:
Am Tage des Herren Sat, 20 Dec 2025 08:10:= 59 -0700
Warner Losh <imp@bsd= imp.com> schrieb:

> On Sat, Dec 20, 2025 at 6:12=E2=80=AFAM A FreeBSD User <freebsd@walstatt-de.de= >
> wrote:
>
> > Hello,
> >
> > recently a small server running recent CURRENT with a UFS basesd = system
> > SSD (NVMe) and a data
> > graveyard based on RAID level 5 with ZFS (attached to a Fujitsu H= BA
> > controler) gets corrupted
> > because of "loosing" a driver - this time the system re= ported TWO drives a
> > removed froma RAID
> > level 5 - which is like a death sentence.
> >
> > I guess this is a fallout of the recently changed timie parameter= s to the
> > CAM infrastructure
> > (I can't find any notes on this in man cam, so I feel lost).<= br> > >=C2=A0
>
> Unlikely, but you can set this in the boot loader:
> kern.cam.tur_timeout=3D60
> kern.cam.inquiry_timeout=3D60
> kern.cam.modesense_timeout=3D60

I'll check, thanks. Are these OIDs documented somewhere to be at hand j= ust in case? I searched
the recent cam manpage ...

scsi.4:
SYSCTL VARIABLES
=C2=A0 =C2=A0 =C2=A0The following variables are a= vailable as both sysctl(8) variables and
=C2=A0 =C2=A0 =C2=A0loader(8) t= unables:

=C2=A0 =C2=A0 =C2=A0kern.cam.cam_srch_hi
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0Search above LUN 7 for SCSI3 and greater devices.
=C2=A0 =C2=A0 =C2=A0kern.cam.tur_timeout
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0Timeout, in ms, for the initial TESTUNITREADY command we send to the<= br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0devices during their initial probing.= =C2=A0 Defaults to 1s.=C2=A0 FreeBSD 15
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0and earlier set this to 60s.

=C2=A0 =C2=A0 =C2=A0kern.cam.inquiry= _timeout
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Timeout, in ms, for the initi= al INQUIRY command we send to the
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0devi= ces during their initial probing.=C2=A0 Defaults to 1s.=C2=A0 FreeBSD 15=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0and earlier set this to 60s.

=C2= =A0 =C2=A0 =C2=A0kern.cam.reportluns_timeout
=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0Timeout, in ms, for the initial REPORTLUNS command we send to the=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0devices during their initial probing.=C2= =A0 Defaults to 50s.

=C2=A0 =C2=A0 =C2=A0kern.cam.modesense_timeout<= br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Timeout, in ms, for the initial MODESE= NSE command we send to the
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0devices dur= ing their initial probing.=C2=A0 Defaults to 1s.=C2=A0 FreeBSD 15
=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0and earlier set this to 60s.
=C2=A0
>
> and see if that works.=C2=A0 You should see new errors on boot if his = is the
> issue. Can you share a dmesg?
>
> I kinda doubt they'd cause the issues that you've had. If disk= s are gone,
> then there'd be different errors to what you are seeing, I'd t= hink.
>
> To recover, your best bet is to use a USB stick from one of the releas= e or
> snapshots.

In earlier times, when "make installkernel and/or make installworld cr= ashed midair, some
binaries in the installed tree were corrupted and since I run CURRENT which= has a tough pace
at the moment, the USB image booting should be close to the CURRENT made vi= a "make world" ...
I assume. I did so and had some problems with the new pkg concept ... (work= ing offline, is a
problem with the install-blob.txz ...)

= Yuck. Sorry that was a source of trouble for you.
=C2=A0
>
> Warner
>
>
> > A very desastrous side effect of this crash was the inability to = reboot
> > the box (CURRENT pre-
> > 16.0-CURRENT #11 master-n282659-7f39d05b67ae: Sat Dec 20 09:35:32= CET
> > 2025amd64, the runtime
> > system was from 16th or 17th of December).
> > After several tenth of minutes I had to hadr reboot the box - wit= h obvious
> > data loss on the
> > system SSD. And here my problems start to turn into a mess.
> >
> > After the first initial reboot I performed a fsck -fy, rebootet a= nd
> > whitnessed that
> > jails didn't come up anymore and SSHD didn't work. So I i= nstalled prior to
> > the crash already
> > compiled CURRENT from /usr/src which is "master-n282659-7f39= d05b67ae" (as
> > the sibling box which
> > is runnig great by the way, but different CPU and smaller RAID, b= ut also
> > system SSD based on
> > UFS filesystem, same HBA. So CURRENT seem to operate in general o= n similar
> > hardware.
> >
> > After the second reboot with the old kernel the box in question w= ent into
> > debugger, rebooting
> > in single user mode and performing fsck -fy revealed a lot of rep= airs on
> > the first partitions,
> > /, /var, /usr. After a reboot I realized that most services now a= re broken
> > - jails do not
> > start, sshd doesn't start and the whole system is going into = multiuser,
> > but seems to have
> > serious problems.
> >
> > uname -a remains empty
> > cd /usr/src; make buildworld returns immediately empty, no furthe= r action
> > service ldconfig start also returns complete empty on console
> >
> > Several onboard/base tools simply return nothing.
> >
> > trying "/resucue/sh" (install date indicates 20th of De= cember, so it is
> > the latest ) seems to
> > give me the first indication of something has terribly gone wrong= or even
> > /rescue/vi (to edit
> > loader to change to boot.old):
> >
> > elf_load_section: truncated ELF file
> > Abort trap
> >
> > Checking /boot/kernel, /lib, /usr/lib, /bin or /sbin seems to be = intakt
> > (as far as I can
> > check, all timestamps are 20th Dec 2025, 9:48 UTC).
> >
> > Well, since this is not the first time I ran into some problems u= sing
> > CURRENT, the outage due
> > to two lost ZFS drives after the recent chenges seems worthy to m= ake some
> > note here.
> >=C2=A0
>
> Can you provide error messages at boot for this? You talk about fsck a= nd
> about ZFS, so I'm a little confused as to your setup.

No need to be confused: the CURRENT crashed/froze after two of five HDD wer= e reported as
"removed" from a RAIDZ pool. The box hung forever.

The OS=C2=A0 resides on a SSD with UFS. After > 30 min I had to switch o= ff/on the box physically.
So the UFS filesystem had a bump (journalling didn't fix it). ZFS "= ;healed" after reboot and
checking the HDD. UFS SSD didn't ...


I spent a while now to bring back everything. Boot device is now ZFS, too. = And, therefore,
obvious slower but somehow save.

The only issue I have now is a crash after a reboot. While rebooting and ki= lling jails, the
box drops into kernel debugger ...

Somehow I need to copy the picture I made from the box, since the machine i= sn't connected to
the net at the moment ...


=C2=A0
>
> Warner
>
>
> > The other question would be how to fix: one strategy would be to = boot from
> > an official image
> > from flash drive and try to perform a "make installkernel in= stallworld".
> > Maybe there is
> > another way idicativ to that what I described above ...
> >=C2=A0
>
>
>
>
> > Thanks in advance,
> >
> > oh
> >
> >
> > --
> >
> > A FreeBSD user
> >=C2=A0



--

A FreeBSD user
--0000000000008c34620646702f44--