Re: Still seeing Failed assertion: "p[i] == 0" on armv7 buildworld
Date: Fri, 14 Nov 2025 17:16:57 UTC
On Nov 14, 2025, at 07:25, bob prohaska <fbsd@www.zefox.net> wrote:
> On Thu, Nov 13, 2025 at 11:16:56PM -0800, Carl Shapiro wrote:
>> bob prohaska <fbsd@www.zefox.net> writes:
>>
>>> All the assertion failures I've seen have been in the clang libraries during
>>> buildworld. They appear to happen in a variety of cases, indicated by the
>>> different .sh and .cpp filenames found in the files under
>>> http://www.zefox.net/~fbsd/assertion_failure/
>>
>> Do you have the stdout and stderr of the build somewhere in there as
>> well? The make(1) invocation in the readme file shows its output being
>> redirected to a file.
>
> Those files have been overwritten by restarting the buildworld sessions.
> They tend to be large and diffcult to synchronize with the .cpp and .sh
> files generated by the crash. It could be done if it's useful.
>
>>
>> The assert you mentioned in the subject of your e-mail message, which I
>> also saw in the readme file, could come from jemalloc. See these lines
>> of code for the context
>>
>> https://github.com/facebook/jemalloc/blob/dev/src/extent.c#L805-L814
>>
>> That assertion will be tripped when jemalloc sees non-zero memory that
>> it expects to be zeroed. See for example
>>
>> https://github.com/facebook/jemalloc/blob/dev/src/pages.c#L55-L106
>>
>> Looking at the code, my hypothesis would be that jemalloc thinks it's
>> committing memory for the first time but the memory is coming back with
>> non-zero data.
>>
>> Just curious, but is over-commit enabled on your system? Here is the
>> signal jemalloc is using to check
>>
>> https://github.com/facebook/jemalloc/blob/dev/src/pages.c#L729-L737
>>
>
> Sysctl -a reports in part:
> # sysctl -a | grep -i overcommit
> sysctl: S_vmtotal 48 != 88
The s_vmtotal line above is from what
sysctl vm.vmtotal
would report: output for
"System wide totals computed every five seconds".
That S_vmtotal line reported is a internal warning from
sysctl. The 88 is correct and is sizeof(struct vmtotal)
from sys/sys/vmmeter.h :
(kgdb) ptype /o *(struct vmtotal*)0
/* offset | size */ type = struct vmtotal {
/* 0 | 8 */ uint64_t t_vm;
/* 8 | 8 */ uint64_t t_avm;
/* 16 | 8 */ uint64_t t_rm;
/* 24 | 8 */ uint64_t t_arm;
/* 32 | 8 */ uint64_t t_vmshr;
/* 40 | 8 */ uint64_t t_avmshr;
/* 48 | 8 */ uint64_t t_rmshr;
/* 56 | 8 */ uint64_t t_armshr;
/* 64 | 8 */ uint64_t t_free;
/* 72 | 2 */ int16_t t_rq;
/* 74 | 2 */ int16_t t_dw;
/* 76 | 2 */ int16_t t_pw;
/* 78 | 2 */ int16_t t_sl;
/* 80 | 2 */ int16_t t_sw;
/* 82 | 6 */ uint16_t t_pad[3];
/* total size (bytes): 88 */
}
The 48 is wrong for what the internal sysctl(. . .)
returned. The message also indicates that the
normal assocaited output was not generated for
vm.vmtotal .
I do not know if the error is somehow associated with
your overlarge swap space (if you still have that).
In my context "sysctl vm.vmtotal" and "sysctl -a"
are working normally.
> vm.overcommit: 0
"man 7 tuning" reports about vm.overcommit :
The vm.overcommit sysctl defines the overcommit behaviour of the vm
subsystem. The virtual memory system always does accounting of the swap
space reservation, both total for system and per-user. Corresponding
values are available through sysctl vm.swap_total, that gives the total
bytes available for swapping, and vm.swap_reserved, that gives number of
bytes that may be needed to back all currently allocated anonymous
memory.
Setting bit 0 of the vm.overcommit sysctl causes the virtual memory
system to return failure to the process when allocation of memory causes
vm.swap_reserved to exceed vm.swap_total. Bit 1 of the sysctl enforces
RLIMIT_SWAP limit (see getrlimit(2)). Root is exempt from this limit.
Bit 2 allows to count most of the physical memory as allocatable, except
wired and free reserved pages (accounted by vm.stats.vm.v_free_target and
vm.stats.vm.v_wire_count sysctls, respectively).
> #
> It's unclear if this implies yes or no, or even is the correct test.
>
>>> The failures are random in the sense that restarting buildworld either
>>> produces a new assertion failure in a different library or completion.
>>>
>>> It isn't obvious how to capture a stack trace, if you can provide guidance
>>> I'll give it a try. As is, buildworld simply stops, the machine does not
>>> crash.
>>
>> It might be captured for you already? I noticed files with names
>> containing "symbolizer-input" and "symbolizer-ouput" like this one
>>
>> http://www.zefox.net/~fbsd/assertion_failure/hostname_pelorus.zefox.org/symbolizer-output-7282d9
>>
>> and the output files contain a stack trace like this
>>
>> llvm::sys::PrintStackTrace(llvm::raw_ostream&, int)
>> /usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:731:7
>>
>> llvm::sys::RunSignalHandlers()
>> /usr/src/contrib/llvm-project/llvm/lib/Support/Signals.cpp:0:5
>>
>> SignalHandler
>> /usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:0:3
>>
>> handle_signal
>> /usr/src/lib/libthr/thread/thr_sig.c:0:3
>>
>> Any idea who or what is creating those files and when?
>
> The files are deposited in /tmp, apparently by the C compiler as records
> of an internal error in the compiler, usually number 134. My understanding
> is superficial at best.
===
Mark Millard
marklmi at yahoo.com