Re: port binary dumping core on recent head in poudriere [tmpfs corruptions involving blocks of zeros that should not be all zeros]
- Reply: Dag-Erling_Smørgrav : "Re: port binary dumping core on recent head in poudriere [tmpfs corruptions involving blocks of zeros that should not be all zeros]"
- In reply to: Mark Millard : "Re: port binary dumping core on recent head in poudriere [tmpfs corruptions involving blocks of zeros that should not be all zeros]"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 26 Nov 2024 08:21:40 UTC
On Nov 25, 2024, at 22:10, Mark Millard <marklmi@yahoo.com> wrote:
> On Nov 25, 2024, at 18:05, Mark Millard <marklmi@yahoo.com> wrote:
>
>> Top posting going in a different direction that
>> established a way to control the behavior in my
>> context . . .
>
> For folks new to the discoveries: the context here
> is poudriere bulk builds, for USE_TMPFS=all vs.
> USE_TMPFS=no . My test context is amd64 on a
> 7950X3D system with 192 GiBytes of RAM. Others have
> other contexts, including an Intel system.
>
>> I changed USE_TMPFS=all to USE_TMPFS=no :
>>
>> USE_TMPFS=all gets the failure
>
> Note: The test case is corruptions of the likes of parts of
> the .got.plt in libsass.so.1.0.0 from text/proc/libsass .
> The corruptions are well 4 KiByte aligned blocks of zeros
> showing up in the files that should not be that way.
>
> 2 examples of bad libsass.so.1.0.0 builds have:
>
> Contents of section .got.plt:
> 2bed60 00000000 00000000 00000000 00000000 ................
> . . .
> 2befc0 00000000 00000000 00000000 00000000 ................
> 2befd0 00000000 00000000 00000000 00000000 ................
> 2befe0 00000000 00000000 00000000 00000000 ................
> 2beff0 00000000 00000000 00000000 00000000 ................
> 2bf000 96ab2a00 00000000 a6ab2a00 00000000 ..*.......*.....
> 2bf010 b6ab2a00 00000000 c6ab2a00 00000000 ..*.......*.....
> 2bf020 d6ab2a00 00000000 e6ab2a00 00000000 ..*.......*.....
> 2bf030 f6ab2a00 00000000 06ac2a00 00000000 ..*.......*.....
> . . .
>
> Contents of section .got.plt:
> 2bed60 00000000 00000000 00000000 00000000 ................
> . . .
> 2befc0 00000000 00000000 00000000 00000000 ................
> 2befd0 00000000 00000000 00000000 00000000 ................
> 2befe0 00000000 00000000 00000000 00000000 ................
> 2beff0 00000000 00000000 00000000 00000000 ................
> 2bf000 00000000 00000000 00000000 00000000 ................
> 2bf010 00000000 00000000 00000000 00000000 ................
> 2bf020 00000000 00000000 00000000 00000000 ................
> 2bf030 00000000 00000000 00000000 00000000 ................
> . . .
> 2bffc0 00000000 00000000 00000000 00000000 ................
> 2bffd0 00000000 00000000 00000000 00000000 ................
> 2bffe0 00000000 00000000 00000000 00000000 ................
> 2bfff0 00000000 00000000 00000000 00000000 ................
> 2c0000 96cb2a00 00000000 a6cb2a00 00000000 ..*.......*.....
> 2c0010 b6cb2a00 00000000 c6cb2a00 00000000 ..*.......*.....
> 2c0020 d6cb2a00 00000000 e6cb2a00 00000000 ..*.......*.....
> 2c0030 f6cb2a00 00000000 06cc2a00 00000000 ..*.......*.....
> . . .
>
> So: Where the zeros end varies but the start of
> good data end up's at some 0x...000 offset: a
> multiple of 4 KiBytes.
>
>> vs.
>> USE_TMPFS=no works just fine
>>
>> So it is a FreeBSD system error associated with
>> use of tmpfs .
>
> Recent work on tmpfs includes:
>
> Mon, 09 Sep 2024
> • git: 8fa5e0f21fd1 - main - tmpfs: Account for whiteouts during rename/rmdir Jason A. Harmening
> Fri, 04 Oct 2024
> • git: 75734c4360fc - main - tmpfs: check residence in data_locked Doug Moore
> Sun, 13 Oct 2024
> • git: ec22e705c266 - main - tmpfs: remove duplicate flags check in tmpfs_rmdir Alan Somers
> Thu, 24 Oct 2024
> • git: db08b0b04dec - main - tmpfs_vnops: move swap work to swap_pager Doug Moore
>
> swap_pager (given the reference to it above):
>
> Tue, 08 Oct 2024
> • git: d0b225d16418 - main - swap_pager: use iterators in swp_pager_meta_build Doug Moore
> Fri, 11 Oct 2024
> • git: 1107834090be - main - swap_pager: swapoff detecting object death Doug Moore
> Thu, 24 Oct 2024
> • git: 34951b0b9e78 - main - swap_pager: move scan_all_shadowed, use iterators Doug Moore
> • git: 02e85d1c8a41 - main - swap_pager: fix assert in seek_data Doug Moore
> • git: faa9356f97d2 - main - swap_pager: fix seek_hole assert Doug Moore
> Sat, 26 Oct 2024
> • git: 39f6d1e7f835 - main - swap_pager: iter in haspage, lookup, getpages Doug Moore
> Wed, 13 Nov 2024
> • git: d11d407aee48 - main - swap_pager: Ensure that swapoff puts swapped-in pages in page queues Mark Johnston
>
> I do not know at this time when the corruptions started. The
> above is only suggestive.
With a bulk -i active but from outside the bulk -i :
# df -m | sort -k6,6 | grep ^tmpfs
tmpfs 182907 0 182907 0% /usr/local/poudriere/data/.m/main-amd64-default
tmpfs 184770 1863 182907 1% /usr/local/poudriere/data/.m/main-amd64-default/ref
tmpfs 2048 45 2002 2% /usr/local/poudriere/data/.m/main-amd64-default/ref/.p
tmpfs 182907 0 182907 0% /usr/local/poudriere/data/.m/main-amd64-default/ref/var/db/ports
Note: bulk -i lands one in /usr/local/poudriere/data/.m/main-amd64-default/ref/
From inside a bulk -i where I did a manual make command
after it built and installed libsass.so.1.0.0 . The
manual make produced a /wrkdirs/ :
# find -s / -name libsass.so.1.0.0 -exec ls -ilodT {} \;
6417 -rwxr-xr-x 1 root wheel - 42444424 Nov 26 07:24:37 2024 /usr/local/lib/libsass.so.1.0.0
11872 -rwxr-xr-x 1 root wheel - 42444424 Nov 26 07:26:48 2024 /wrkdirs/usr/ports/textproc/libsass/work/libsass-3.6.6/src/.libs/libsass.so.1.0.0
12294 -rwxr-xr-x 1 root wheel - 42444424 Nov 26 07:26:48 2024 /wrkdirs/usr/ports/textproc/libsass/work/stage/usr/local/lib/libsass.so.1.0.0
# objdump -hs /wrkdirs/usr/ports/textproc/libsass/work/libsass-3.6.6/src/.libs/libsass.so.1.0.0 | less
. . .
2bed60 78ba2b00 00000000 00000000 00000000 x.+.............
2bed70 00000000 00000000 86a62a00 00000000 ..........*.....
2bed80 96a62a00 00000000 a6a62a00 00000000 ..*.......*.....
2bed90 b6a62a00 00000000 c6a62a00 00000000 ..*.......*.....
. . .
So the original creation looks okay. But . . .
# objdump -hs /wrkdirs/usr/ports/textproc/libsass/work/stage/usr/local/lib/libsass.so.1.0.0 | less
. . .
2bed60 00000000 00000000 00000000 00000000 ................
2bed70 00000000 00000000 00000000 00000000 ................
2bed80 00000000 00000000 00000000 00000000 ................
2bed90 00000000 00000000 00000000 00000000 ................
. . .
2befc0 00000000 00000000 00000000 00000000 ................
2befd0 00000000 00000000 00000000 00000000 ................
2befe0 00000000 00000000 00000000 00000000 ................
2beff0 00000000 00000000 00000000 00000000 ................
2bf000 96ab2a00 00000000 a6ab2a00 00000000 ..*.......*.....
2bf010 b6ab2a00 00000000 c6ab2a00 00000000 ..*.......*.....
2bf020 d6ab2a00 00000000 e6ab2a00 00000000 ..*.......*.....
2bf030 f6ab2a00 00000000 06ac2a00 00000000 ..*.......*.....
. . .
So: The later, staged copy is a bad copy. Both are in the
tmpfs. So copying to the staging area makes a corrupted
copy inside the same tmpfs. After that, further copies of
staging's bad copy can be expected to be messed up.
===
Mark Millard
marklmi at yahoo.com