Re: port binary dumping core on recent head in poudriere [tmpfs corruptions involving blocks of zeros that should not be all zeros]

From: Sean C. Farley <scf_at_FreeBSD.org>
Date: Fri, 29 Nov 2024 03:54:32 UTC
On Thu, 28 Nov 2024, Mark Millard wrote:

> Sean C. Farley <scf_at_FreeBSD.org> wrote on
> Date: Thu, 28 Nov 2024 18:16:16 UTC :
>
>> On Mon, 25 Nov 2024, Mark Millard wrote:
>>
>>> On Nov 25, 2024, at 18:05, Mark Millard <marklmi@yahoo.com> wrote:
>>>
>>>> Top posting going in a different direction that
>>>> established a way to control the behavior in my
>>>> context . . .
>>>
>>> For folks new to the discoveries: the context here
>>> is poudriere bulk builds, for USE_TMPFS=all vs.
>>> USE_TMPFS=no . My test context is amd64 on a
>>> 7950X3D system with 192 GiBytes of RAM. Others have
>>> other contexts, including an Intel system.

*snip*

>> System setup:
>> - FreeBSD 14.2-STABLE
>
> The context that I investigated --and what was fixed by a commit only
> applies to-- main [so; 15 as stands], not stable/14 .
>
> stable/14 has no commits mentioning "tmpfs" after 2024-Jun-04.

Thank you.  That was my mistake.  I will continue searching for an 
answer.  Once I find a way to more consistently trigger it, it will be 
much easier.

I ran all of the tmpfs*.sh tests from HEAD which all pass except for 
tmpfs24.sh.

$ ./all.sh -o tmpfs24.sh
20241128 22:33:38 all: tmpfs24.sh
Min hole size is 4096, file size is 524288000.
data #1 @ 0, size=4096)
hole #2 @ 4096, size=4096
data #3 @ 8192, size=4096)
hole #4 @ 12288, size=4096
data #5 @ 16384, size=4096)
hole #6 @ 20480, size=524267520
--- /tmp/tmpfs24.exp    2024-11-28 22:33:40.222199000 -0500
+++ /tmp/tmpfs24.log    2024-11-28 22:33:40.225048000 -0500
@@ -5,4 +5,3 @@
  hole #4 @ 12288, size=4096
  data #5 @ 16384, size=4096)
  hole #6 @ 20480, size=524267520
-<<Missing EOF hole>>
FAIL tmpfs24.sh exit code 1

>> - i7-14700K (latest BIOS which *should* fix Intel power-related bugs)
>> - 128 GiB RAM
>> - ZFS (mirrored drives)
>
> The primary test context was ZFS but no redundancy or such. (Only
> really used for bectl activity.) But testing on a UFS copy of
> the live directory tree also got the problem. The actual problem
> was in tmpfs support.

That was what I thought from what I read, but I wanted to make sure I 
did not leave out an important detail.

>> - 2 encrypted swap partitions (64 GiB each, lightly used)
>
> No encryption involved in my context at all.
>
>> - Lightly undervolted (-0.06 offset to Global Core SVID Voltage)
>
> Nothing analogous in my context.
>
>> - /tmp is tmpfs
>
> I have no default areas that are tmpfs: so only what
> poudriere temporarily created during the bulk builds.
>
>> - ${HOME}/.cache is tmpfs
>
> No use of ccache or the like.
>
>> - Poudriere:
>> - USE_TMPFS=all
>
> I also use TMPFS_BLACKLIST .
>
> My personal environment causes use of -gline-tables-only as
> debug information normally. (That option is clang/clang++
> specific. gcc* and clang* do not seem to have a common
> notation for analogous settings on the command line.)
>
>> - ccache
>
> No use of ccache or the like.
>
>>    - jail version in sync with host
>
> True for my context. But the issue that was fixed was
> in the kernel code, not the world code.
>
>> - /usr/ports is mounted with nullfs
>
> Also true for my context.

I appreciate that information.

*snip build failure*

> None of this is directly stable/14 :  all main
> [so: 15 as stands].
>
> stable/14 has no commits mentioning "tmpfs" after 2024-Jun-04. So
> none of these changes are involved for stable/14 .

It was a long shot on my part anyway.  :)

Sean
-- 
scf@FreeBSD.org