EDK2 and RPi5: sudden 1000+ inode check-hash failures and such on occasion

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sun, 21 Jan 2024 07:28:09 UTC
In experimenting with the RPi5 via the existing EDK2 I've had
multiple occasions when the system suddenly had 1000+ inode
check-hash failures and such shown during activity that
scans directories and files.

Rebooting via boot -s and using fsck_ffs -y / agree about the
there being such. There had been earlier fsck_ffs / runs that
passed.

So far the experiments have been with main.

Some recent activity has been since starting my experiments
with pkg base and the port packages installed. This lead to
being able to use:

# pkg check -s

which ends up listing missing files. It looks like
when a file is missing in a directory, a bunch are
typically missing from that same directory. The
examples outside /usr/local/ in the recent case
were inside /usr/src/ . It also got:

FreeBSD-clang-dbg-15.snap20240120223308: checksum mismatch for /usr/lib/debug/usr/bin/llvm-nm.debug
g_vfs_done():ufs/rootfs[READ(offset=-432310823128264704, length=32768)]error = 5
g_vfs_done():ufs/rootfs[READ(offset=-432310823128264704, length=32768)]error = 5
g_vfs_done():ufs/rootfs[READ(offset=-432310823128264704, length=32768)]error = 5
g_vfs_done():ufs/rootfs[READ(offset=-432310823128264704, length=32768)]error = 5
pkg_checksum_hash_sha256_file(read failed): Input/output error
FreeBSD-clang-dbg-15.snap20240120223308: checksum mismatch for /usr/lib/debug/usr/bin/llvm-objcopy.debug

In more activity I've gotten:

[1/2] Installing FreeBSD-src-15.snap20240120223308...
/: bad dir ino 12259846 at offset 0: mangled entry
/: bad dir ino 12259846 at offset 512: mangled entry
/: bad dir ino 12259846 at offset 0: mangled entry
/: bad dir ino 12259846 at offset 512: mangled entry
/: bad dir ino 12259846 at offset 0: mangled entry
/: bad dir ino 12259846 at offset 512: mangled entry
/: bad dir ino 12259846 at offset 0: mangled entry
/: bad dir ino 12259846 at offset 512: mangled entry
/: bad dir ino 12259846 at offset 0: mangled entry
/: bad dir ino 12259846 at offset 512: mangled entry
/: bad dir ino 12259846 at offset 0: mangled entry
/: bad dir ino 12259846 at offset 512: mangled entry
/: bad dir ino 12259846 at offset 0: mangled entry
/: bad dir ino 12259846 at offset 512: mangled entry
/: bad dir ino 12259846 at offset 0: mangled entry
/: bad dir ino 12259846 at offset 512: mangled entry
/: bad dir ino 12259846 at offset 0: mangled entry
/: bad dir ino 12259846 at offset 512: mangled entry
/: bad dir ino 12259846 at offset 0: mangled entry
/: bad dir ino 12259846 at offset 512: mangled entry
/: bad dir ino 12259846 at offset 0: mangled entry
/: bad dir ino 12259846 at offset 512: mangled entry
/: bad dir ino 12259846 at offset 0: mangled entry
/: bad dir ino 12259846 at offset 512: mangled entry
/: bad dir ino 12259846 at offset 0: mangled entry
/: bad dir ino 12259846 at offset 512: mangled entry
/: bad dir ino 12259846 at offset 0: mangled entry
/: bad dir ino 12259846 at offset 512: mangled entry


It appears that installing/updating the likes of
FreeBSD-src and FreeBSD-src-sys lead to an fsck-ffs /
(no writes) just afterwards failing with large numbers
of errors. (Many may well be consequences of others,
rather than being independent.)

But doing shutdown now, moutn -r /, poweroff and
booting the same media in a RPi4B via the normal
U-Boot UEFI/fdt style did not find failures via
"fsck_ffs -y /" (writable). "pkg check -s -a"
after going back to multi-user mode also did not
complain.

This suggests that the RPi5 was seeing problems in
memory before much of the media had been updated
with bad information (and that the "shutdown now"
did not write out [much?] problematical data).

Same media but on RPi4B using U-boot 2024.01's
UEFI/fdt : "pkg upgrade -f" did not lead to
any problems being detected in a following
"fsck_ffs /" (no write) after rebooting.

The above had been using the kernel-GENERIC-NODEBUG .
I tried kernel-GENERIC. It gets the same kinds of
RPi5 problems but does not report any other debug
information while doing so.


I wonder if:

hw.busdma.zone0.alignment: 524288
(a.k.a. 0x7FFFFu+0x1u)

that happens for the EDK2 context on the RPi5
contributes to any problems handling things.
For hw.busdma.zone0.lowaddr 0xffffffff there
are only 8192 positions with the indicated
alignment and fitting in the 32-bit address
subrange involved.

An earlier report of mine showing example
hw.busdma information from an largely idle
since boot was:

# sysctl hw.busdma
hw.busdma.zone0.total_deferred_time: 0 0
hw.busdma.zone0.domain: 0
hw.busdma.zone0.alignment: 524288
hw.busdma.zone0.lowaddr: 0xffffffff
hw.busdma.zone0.total_deferred: 0
hw.busdma.zone0.total_bounced: 12018773
hw.busdma.zone0.active_bpages: 12
hw.busdma.zone0.reserved_bpages: 0
hw.busdma.zone0.free_bpages: 1227
hw.busdma.zone0.total_bpages: 1239
hw.busdma.total_bpages: 1239


===
Mark Millard
marklmi at yahoo.com