Re: git: 076002f24d35 - main - Do comprehensive UFS/FFS superblock integrity checks when reading a superblock.

From: Cy Schubert <Cy.Schubert_at_cschubert.com>
Date: Mon, 30 May 2022 22:53:29 UTC
In message <YpVIJ0ZuPCrGE5zy@albert.catwhisker.org>, David Wolfskill writes:
> 
> --3FHA/mTlTih/VEaV
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> Content-Transfer-Encoding: quoted-printable
>
> On Mon, May 30, 2022 at 04:33:47PM -0600, Warner Losh wrote:
> > On Mon, May 30, 2022 at 4:24 PM Cy Schubert <Cy.Schubert@cschubert.com>
> > wrote:
> >=20
> > > Upgrading boot blocks didn't help either.
> > >
> > > It only happened on one of four machines. Likely because the other three
> > > are AMD on Asus MBs while the problem machine is an Acer laptop running
> > > Intel.
>
> In my case, the laptops are OK, but my build machine is affected.
>
> Info on it is anchored from
> https://www.catwhisker.org/~david/FreeBSD/history/ (ref. machine
> "freebeast").
>
> > David Wolfskill reported the same: some are affected, others not.
> > It's unclear why, exactly, but all the other details you gave track
> > with the troubleshooting tsoome and I have been doing with him.
>
> I suppose it's a bit of a relief to know I'm not alone in this. :-}
>
> > The issue is inside of loader.efi or /boot/loader, not in the earlier
> > boot blocks.
> > ....
>
> I've ended up putting a gzipped dd image of the full file system from
> /dev/ada0s4a up at https://www.catwhisker.org/~david/FreeBSD/head/loader/

I don't think this is a filesystem problem. All my / filesystems trace 
their ancestry back to the same machine decades ago. All were cloned at one 
point using dump | restore. All were newfs'd at some point to update from 
8K blocks to 16K and eventually 32K blocksize, either using my USB rescue 
disk or a little bit of geom_mirror musical chairs. All four machines are 
consistently set up the same as each other (as much as possible considering 
different hardware and mirrors, except for the laptop which has a single 
disk).

Another reason I don't think this a filesystem problem is that the root 
filesystem of the laptop was temporarily backed up while booted on the USB 
rescue disk, newfs performed on the slice, and restored. It is now a brand 
new UFS filesystem and this commit still had a problem.

Also, reverting this commit resolved the problem. My guess is that a 
malloc() may be failing on the Acer/Intel laptop while succeeding on the 
Asus/AMD machines in my basement.


-- 
Cheers,
Cy Schubert <Cy.Schubert@komquats.com> or <Cy.Schubert@cschubert.com>
FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  http://www.FreeBSD.org
NTP:           <cy@nwtime.org>    Web:  https://nwtime.org

			e**(i*pi)+1=0