Re: port binary dumping core on recent head in poudriere [tmpfs corruptions involving blocks of zeros that should not be all zeros]

From: Mark Millard <marklmi_at_yahoo.com>
Date: Tue, 26 Nov 2024 08:21:40 UTC
On Nov 25, 2024, at 22:10, Mark Millard <marklmi@yahoo.com> wrote:

> On Nov 25, 2024, at 18:05, Mark Millard <marklmi@yahoo.com> wrote:
> 
>> Top posting going in a different direction that
>> established a way to control the behavior in my
>> context . . .
> 
> For folks new to the discoveries: the context here
> is poudriere bulk builds, for USE_TMPFS=all vs.
> USE_TMPFS=no . My test context is amd64 on a
> 7950X3D system with 192 GiBytes of RAM. Others have
> other contexts, including an Intel system.
> 
>> I changed USE_TMPFS=all to USE_TMPFS=no :
>> 
>> USE_TMPFS=all gets the failure
> 
> Note: The test case is corruptions of the likes of parts of
> the .got.plt in libsass.so.1.0.0 from text/proc/libsass .
> The corruptions are well 4 KiByte aligned blocks of zeros
> showing up in the files that should not be that way.
> 
> 2 examples of bad libsass.so.1.0.0 builds have:
> 
> Contents of section .got.plt:
> 2bed60 00000000 00000000 00000000 00000000  ................
> . . .
> 2befc0 00000000 00000000 00000000 00000000  ................
> 2befd0 00000000 00000000 00000000 00000000  ................
> 2befe0 00000000 00000000 00000000 00000000  ................
> 2beff0 00000000 00000000 00000000 00000000  ................
> 2bf000 96ab2a00 00000000 a6ab2a00 00000000  ..*.......*.....
> 2bf010 b6ab2a00 00000000 c6ab2a00 00000000  ..*.......*.....
> 2bf020 d6ab2a00 00000000 e6ab2a00 00000000  ..*.......*.....
> 2bf030 f6ab2a00 00000000 06ac2a00 00000000  ..*.......*.....
> . . .
> 
> Contents of section .got.plt:
> 2bed60 00000000 00000000 00000000 00000000  ................
> . . .
> 2befc0 00000000 00000000 00000000 00000000  ................
> 2befd0 00000000 00000000 00000000 00000000  ................
> 2befe0 00000000 00000000 00000000 00000000  ................
> 2beff0 00000000 00000000 00000000 00000000  ................
> 2bf000 00000000 00000000 00000000 00000000  ................
> 2bf010 00000000 00000000 00000000 00000000  ................
> 2bf020 00000000 00000000 00000000 00000000  ................
> 2bf030 00000000 00000000 00000000 00000000  ................
> . . .
> 2bffc0 00000000 00000000 00000000 00000000  ................
> 2bffd0 00000000 00000000 00000000 00000000  ................
> 2bffe0 00000000 00000000 00000000 00000000  ................
> 2bfff0 00000000 00000000 00000000 00000000  ................
> 2c0000 96cb2a00 00000000 a6cb2a00 00000000  ..*.......*.....
> 2c0010 b6cb2a00 00000000 c6cb2a00 00000000  ..*.......*.....
> 2c0020 d6cb2a00 00000000 e6cb2a00 00000000  ..*.......*.....
> 2c0030 f6cb2a00 00000000 06cc2a00 00000000  ..*.......*.....
> . . .
> 
> So: Where the zeros end varies but the start of
> good data end up's at some 0x...000 offset: a
> multiple of 4 KiBytes.
> 
>> vs.
>> USE_TMPFS=no works just fine
>> 
>> So it is a FreeBSD system error associated with
>> use of tmpfs .
> 
> Recent work on tmpfs includes:
> 
> Mon, 09 Sep 2024
> • git: 8fa5e0f21fd1 - main - tmpfs: Account for whiteouts during rename/rmdir Jason A. Harmening
> Fri, 04 Oct 2024
> • git: 75734c4360fc - main - tmpfs: check residence in data_locked Doug Moore
> Sun, 13 Oct 2024
> • git: ec22e705c266 - main - tmpfs: remove duplicate flags check in tmpfs_rmdir Alan Somers
> Thu, 24 Oct 2024
> • git: db08b0b04dec - main - tmpfs_vnops: move swap work to swap_pager Doug Moore
> 
> swap_pager (given the reference to it above):
> 
> Tue, 08 Oct 2024
>    • git: d0b225d16418 - main - swap_pager: use iterators in swp_pager_meta_build Doug Moore
> Fri, 11 Oct 2024
>    • git: 1107834090be - main - swap_pager: swapoff detecting object death Doug Moore
> Thu, 24 Oct 2024
>    • git: 34951b0b9e78 - main - swap_pager: move scan_all_shadowed, use iterators Doug Moore
>    • git: 02e85d1c8a41 - main - swap_pager: fix assert in seek_data Doug Moore 
>    • git: faa9356f97d2 - main - swap_pager: fix seek_hole assert Doug Moore
> Sat, 26 Oct 2024
>    • git: 39f6d1e7f835 - main - swap_pager: iter in haspage, lookup, getpages Doug Moore
> Wed, 13 Nov 2024
>    • git: d11d407aee48 - main - swap_pager: Ensure that swapoff puts swapped-in pages in page queues Mark Johnston
> 
> I do not know at this time when the corruptions started. The
> above is only suggestive.

With a bulk -i active but from outside the bulk -i :

# df -m | sort -k6,6 | grep ^tmpfs
tmpfs                                                                               182907     0 182907     0%    /usr/local/poudriere/data/.m/main-amd64-default
tmpfs                                                                               184770  1863 182907     1%    /usr/local/poudriere/data/.m/main-amd64-default/ref
tmpfs                                                                                 2048    45   2002     2%    /usr/local/poudriere/data/.m/main-amd64-default/ref/.p
tmpfs                                                                               182907     0 182907     0%    /usr/local/poudriere/data/.m/main-amd64-default/ref/var/db/ports

Note: bulk -i lands one in /usr/local/poudriere/data/.m/main-amd64-default/ref/


From inside a bulk -i where I did a manual make command
after it built and installed libsass.so.1.0.0 . The
manual make produced a /wrkdirs/ :

# find -s / -name libsass.so.1.0.0 -exec ls -ilodT {} \;
6417 -rwxr-xr-x  1 root wheel - 42444424 Nov 26 07:24:37 2024 /usr/local/lib/libsass.so.1.0.0
11872 -rwxr-xr-x  1 root wheel - 42444424 Nov 26 07:26:48 2024 /wrkdirs/usr/ports/textproc/libsass/work/libsass-3.6.6/src/.libs/libsass.so.1.0.0
12294 -rwxr-xr-x  1 root wheel - 42444424 Nov 26 07:26:48 2024 /wrkdirs/usr/ports/textproc/libsass/work/stage/usr/local/lib/libsass.so.1.0.0

# objdump -hs /wrkdirs/usr/ports/textproc/libsass/work/libsass-3.6.6/src/.libs/libsass.so.1.0.0 | less
. . .
 2bed60 78ba2b00 00000000 00000000 00000000  x.+.............
 2bed70 00000000 00000000 86a62a00 00000000  ..........*.....
 2bed80 96a62a00 00000000 a6a62a00 00000000  ..*.......*.....
 2bed90 b6a62a00 00000000 c6a62a00 00000000  ..*.......*.....
. . .

So the original creation looks okay. But . . .

# objdump -hs /wrkdirs/usr/ports/textproc/libsass/work/stage/usr/local/lib/libsass.so.1.0.0 | less
. . .
 2bed60 00000000 00000000 00000000 00000000  ................
 2bed70 00000000 00000000 00000000 00000000  ................
 2bed80 00000000 00000000 00000000 00000000  ................
 2bed90 00000000 00000000 00000000 00000000  ................
. . .
 2befc0 00000000 00000000 00000000 00000000  ................
 2befd0 00000000 00000000 00000000 00000000  ................
 2befe0 00000000 00000000 00000000 00000000  ................
 2beff0 00000000 00000000 00000000 00000000  ................
 2bf000 96ab2a00 00000000 a6ab2a00 00000000  ..*.......*.....
 2bf010 b6ab2a00 00000000 c6ab2a00 00000000  ..*.......*.....
 2bf020 d6ab2a00 00000000 e6ab2a00 00000000  ..*.......*.....
 2bf030 f6ab2a00 00000000 06ac2a00 00000000  ..*.......*.....
. . .

So: The later, staged copy is a bad copy. Both are in the
tmpfs. So copying to the staging area makes a corrupted
copy inside the same tmpfs. After that, further copies of
staging's bad copy can be expected to be messed up.


===
Mark Millard
marklmi at yahoo.com