Re: port binary dumping core on recent head in poudriere [tmpfs corruptions involving blocks of zeros that should not be all zeros]
- Reply: Dag-Erling_Smørgrav : "Re: port binary dumping core on recent head in poudriere [tmpfs corruptions involving blocks of zeros that should not be all zeros]"
- In reply to: Mark Millard : "Re: port binary dumping core on recent head in poudriere [tmpfs corruptions involving blocks of zeros that should not be all zeros]"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 26 Nov 2024 08:21:40 UTC
On Nov 25, 2024, at 22:10, Mark Millard <marklmi@yahoo.com> wrote: > On Nov 25, 2024, at 18:05, Mark Millard <marklmi@yahoo.com> wrote: > >> Top posting going in a different direction that >> established a way to control the behavior in my >> context . . . > > For folks new to the discoveries: the context here > is poudriere bulk builds, for USE_TMPFS=all vs. > USE_TMPFS=no . My test context is amd64 on a > 7950X3D system with 192 GiBytes of RAM. Others have > other contexts, including an Intel system. > >> I changed USE_TMPFS=all to USE_TMPFS=no : >> >> USE_TMPFS=all gets the failure > > Note: The test case is corruptions of the likes of parts of > the .got.plt in libsass.so.1.0.0 from text/proc/libsass . > The corruptions are well 4 KiByte aligned blocks of zeros > showing up in the files that should not be that way. > > 2 examples of bad libsass.so.1.0.0 builds have: > > Contents of section .got.plt: > 2bed60 00000000 00000000 00000000 00000000 ................ > . . . > 2befc0 00000000 00000000 00000000 00000000 ................ > 2befd0 00000000 00000000 00000000 00000000 ................ > 2befe0 00000000 00000000 00000000 00000000 ................ > 2beff0 00000000 00000000 00000000 00000000 ................ > 2bf000 96ab2a00 00000000 a6ab2a00 00000000 ..*.......*..... > 2bf010 b6ab2a00 00000000 c6ab2a00 00000000 ..*.......*..... > 2bf020 d6ab2a00 00000000 e6ab2a00 00000000 ..*.......*..... > 2bf030 f6ab2a00 00000000 06ac2a00 00000000 ..*.......*..... > . . . > > Contents of section .got.plt: > 2bed60 00000000 00000000 00000000 00000000 ................ > . . . > 2befc0 00000000 00000000 00000000 00000000 ................ > 2befd0 00000000 00000000 00000000 00000000 ................ > 2befe0 00000000 00000000 00000000 00000000 ................ > 2beff0 00000000 00000000 00000000 00000000 ................ > 2bf000 00000000 00000000 00000000 00000000 ................ > 2bf010 00000000 00000000 00000000 00000000 ................ > 2bf020 00000000 00000000 00000000 00000000 ................ > 2bf030 00000000 00000000 00000000 00000000 ................ > . . . > 2bffc0 00000000 00000000 00000000 00000000 ................ > 2bffd0 00000000 00000000 00000000 00000000 ................ > 2bffe0 00000000 00000000 00000000 00000000 ................ > 2bfff0 00000000 00000000 00000000 00000000 ................ > 2c0000 96cb2a00 00000000 a6cb2a00 00000000 ..*.......*..... > 2c0010 b6cb2a00 00000000 c6cb2a00 00000000 ..*.......*..... > 2c0020 d6cb2a00 00000000 e6cb2a00 00000000 ..*.......*..... > 2c0030 f6cb2a00 00000000 06cc2a00 00000000 ..*.......*..... > . . . > > So: Where the zeros end varies but the start of > good data end up's at some 0x...000 offset: a > multiple of 4 KiBytes. > >> vs. >> USE_TMPFS=no works just fine >> >> So it is a FreeBSD system error associated with >> use of tmpfs . > > Recent work on tmpfs includes: > > Mon, 09 Sep 2024 > • git: 8fa5e0f21fd1 - main - tmpfs: Account for whiteouts during rename/rmdir Jason A. Harmening > Fri, 04 Oct 2024 > • git: 75734c4360fc - main - tmpfs: check residence in data_locked Doug Moore > Sun, 13 Oct 2024 > • git: ec22e705c266 - main - tmpfs: remove duplicate flags check in tmpfs_rmdir Alan Somers > Thu, 24 Oct 2024 > • git: db08b0b04dec - main - tmpfs_vnops: move swap work to swap_pager Doug Moore > > swap_pager (given the reference to it above): > > Tue, 08 Oct 2024 > • git: d0b225d16418 - main - swap_pager: use iterators in swp_pager_meta_build Doug Moore > Fri, 11 Oct 2024 > • git: 1107834090be - main - swap_pager: swapoff detecting object death Doug Moore > Thu, 24 Oct 2024 > • git: 34951b0b9e78 - main - swap_pager: move scan_all_shadowed, use iterators Doug Moore > • git: 02e85d1c8a41 - main - swap_pager: fix assert in seek_data Doug Moore > • git: faa9356f97d2 - main - swap_pager: fix seek_hole assert Doug Moore > Sat, 26 Oct 2024 > • git: 39f6d1e7f835 - main - swap_pager: iter in haspage, lookup, getpages Doug Moore > Wed, 13 Nov 2024 > • git: d11d407aee48 - main - swap_pager: Ensure that swapoff puts swapped-in pages in page queues Mark Johnston > > I do not know at this time when the corruptions started. The > above is only suggestive. With a bulk -i active but from outside the bulk -i : # df -m | sort -k6,6 | grep ^tmpfs tmpfs 182907 0 182907 0% /usr/local/poudriere/data/.m/main-amd64-default tmpfs 184770 1863 182907 1% /usr/local/poudriere/data/.m/main-amd64-default/ref tmpfs 2048 45 2002 2% /usr/local/poudriere/data/.m/main-amd64-default/ref/.p tmpfs 182907 0 182907 0% /usr/local/poudriere/data/.m/main-amd64-default/ref/var/db/ports Note: bulk -i lands one in /usr/local/poudriere/data/.m/main-amd64-default/ref/ From inside a bulk -i where I did a manual make command after it built and installed libsass.so.1.0.0 . The manual make produced a /wrkdirs/ : # find -s / -name libsass.so.1.0.0 -exec ls -ilodT {} \; 6417 -rwxr-xr-x 1 root wheel - 42444424 Nov 26 07:24:37 2024 /usr/local/lib/libsass.so.1.0.0 11872 -rwxr-xr-x 1 root wheel - 42444424 Nov 26 07:26:48 2024 /wrkdirs/usr/ports/textproc/libsass/work/libsass-3.6.6/src/.libs/libsass.so.1.0.0 12294 -rwxr-xr-x 1 root wheel - 42444424 Nov 26 07:26:48 2024 /wrkdirs/usr/ports/textproc/libsass/work/stage/usr/local/lib/libsass.so.1.0.0 # objdump -hs /wrkdirs/usr/ports/textproc/libsass/work/libsass-3.6.6/src/.libs/libsass.so.1.0.0 | less . . . 2bed60 78ba2b00 00000000 00000000 00000000 x.+............. 2bed70 00000000 00000000 86a62a00 00000000 ..........*..... 2bed80 96a62a00 00000000 a6a62a00 00000000 ..*.......*..... 2bed90 b6a62a00 00000000 c6a62a00 00000000 ..*.......*..... . . . So the original creation looks okay. But . . . # objdump -hs /wrkdirs/usr/ports/textproc/libsass/work/stage/usr/local/lib/libsass.so.1.0.0 | less . . . 2bed60 00000000 00000000 00000000 00000000 ................ 2bed70 00000000 00000000 00000000 00000000 ................ 2bed80 00000000 00000000 00000000 00000000 ................ 2bed90 00000000 00000000 00000000 00000000 ................ . . . 2befc0 00000000 00000000 00000000 00000000 ................ 2befd0 00000000 00000000 00000000 00000000 ................ 2befe0 00000000 00000000 00000000 00000000 ................ 2beff0 00000000 00000000 00000000 00000000 ................ 2bf000 96ab2a00 00000000 a6ab2a00 00000000 ..*.......*..... 2bf010 b6ab2a00 00000000 c6ab2a00 00000000 ..*.......*..... 2bf020 d6ab2a00 00000000 e6ab2a00 00000000 ..*.......*..... 2bf030 f6ab2a00 00000000 06ac2a00 00000000 ..*.......*..... . . . So: The later, staged copy is a bad copy. Both are in the tmpfs. So copying to the staging area makes a corrupted copy inside the same tmpfs. After that, further copies of staging's bad copy can be expected to be messed up. === Mark Millard marklmi at yahoo.com