Re: Still seeing Failed assertion: "p[i] == 0" on armv7 buildworld

In reply to: bob prohaska : "Re: Still seeing Failed assertion: "p[i] == 0" on armv7 buildworld"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Mark Millard <marklmi_at_yahoo.com>
Date: Fri, 14 Nov 2025 20:14:57 UTC
On Nov 14, 2025, at 11:25, bob prohaska <fbsd@www.zefox.net> wrote:

> On Fri, Nov 14, 2025 at 09:16:57AM -0800, Mark Millard wrote:
>> On Nov 14, 2025, at 07:25, bob prohaska <fbsd@www.zefox.net> wrote:
>> 
>>> On Thu, Nov 13, 2025 at 11:16:56PM -0800, Carl Shapiro wrote:
>>>> bob prohaska <fbsd@www.zefox.net> writes:
>>>> 
>>>>> All the assertion failures I've seen have been in the clang libraries during
>>>>> buildworld. They appear to happen in a variety of cases, indicated by the 
>>>>> different .sh and .cpp filenames found in the files under
>>>>> http://www.zefox.net/~fbsd/assertion_failure/
>>>> 
>>>> Do you have the stdout and stderr of the build somewhere in there as
>>>> well?  The make(1) invocation in the readme file shows its output being
>>>> redirected to a file.
>>> 
>>> Those files have been overwritten by restarting the buildworld sessions.
>>> They tend to be large and diffcult to synchronize with the .cpp and .sh
>>> files generated by the crash. It could be done if it's useful.
>>> 
>>>> 
>>>> The assert you mentioned in the subject of your e-mail message, which I
>>>> also saw in the readme file, could come from jemalloc.  See these lines
>>>> of code for the context
>>>> 
>>>> https://github.com/facebook/jemalloc/blob/dev/src/extent.c#L805-L814
>>>> 
>>>> That assertion will be tripped when jemalloc sees non-zero memory that
>>>> it expects to be zeroed.  See for example
>>>> 
>>>> https://github.com/facebook/jemalloc/blob/dev/src/pages.c#L55-L106
>>>> 
>>>> Looking at the code, my hypothesis would be that jemalloc thinks it's
>>>> committing memory for the first time but the memory is coming back with
>>>> non-zero data.
>>>> 
>>>> Just curious, but is over-commit enabled on your system?  Here is the
>>>> signal jemalloc is using to check
>>>> 
>>>> https://github.com/facebook/jemalloc/blob/dev/src/pages.c#L729-L737
>>>> 
>>> 
>>> Sysctl -a reports in part:
>>> # sysctl -a | grep -i overcommit
>>> sysctl: S_vmtotal 48 != 88
>> 
>> The s_vmtotal line above is from what
>> 
>> sysctl vm.vmtotal
>> 
>> would report: output for
>> 
>> "System wide totals computed every five seconds".
>> 
>> That S_vmtotal line reported is a internal warning from
>> sysctl. The 88 is correct and is sizeof(struct vmtotal)
>> from sys/sys/vmmeter.h :
>> 
>> (kgdb) ptype /o *(struct vmtotal*)0
>> /* offset      |    size */  type = struct vmtotal {
>> /*      0      |       8 */    uint64_t t_vm;
>> /*      8      |       8 */    uint64_t t_avm;
>> /*     16      |       8 */    uint64_t t_rm;
>> /*     24      |       8 */    uint64_t t_arm;
>> /*     32      |       8 */    uint64_t t_vmshr;
>> /*     40      |       8 */    uint64_t t_avmshr;
>> /*     48      |       8 */    uint64_t t_rmshr;
>> /*     56      |       8 */    uint64_t t_armshr;
>> /*     64      |       8 */    uint64_t t_free;
>> /*     72      |       2 */    int16_t t_rq;
>> /*     74      |       2 */    int16_t t_dw;
>> /*     76      |       2 */    int16_t t_pw;
>> /*     78      |       2 */    int16_t t_sl;
>> /*     80      |       2 */    int16_t t_sw;
>> /*     82      |       6 */    uint16_t t_pad[3];
>> 
>>                               /* total size (bytes):   88 */
>>                             }
>> 
>> The 48 is wrong for what the internal sysctl(. . .)
>> returned. The message also indicates that the
>> normal assocaited output was not generated for
>> vm.vmtotal .
>> 
>> I do not know if the error is somehow associated with
>> your overlarge swap space (if you still have that).
>> In my context "sysctl vm.vmtotal" and "sysctl -a"
>> are working normally.
> 
> The 48 is likely related to having excess swap space.
> On a machine with 1.77 GB swap the command reports
> root@www:/usr/src # sysctl -a | grep -i overcommit
> vm.overcommit: 0

It still indicates some invalid internal state in
your system. I suggest avoiding being in that status
until after your environment no longer has its 2
other problems (hang-ups and jemalloc assertion
failures).

> I don't think it's related to the assertion failure,
> since that host experiences assertion failures as often
> as hosts with excess swap space.

I recommend avoiding having any reported corruptions
active during the search for solutions to your 2
problems: Avoid guessing about interactions of
oddities.

>>> vm.overcommit: 0
>> 
>> "man 7 tuning" reports about vm.overcommit :
>> 
>>     The vm.overcommit sysctl defines the overcommit behaviour of the vm
>>     subsystem.  The virtual memory system always does accounting of the swap
>>     space reservation, both total for system and per-user.  Corresponding
>>     values are available through sysctl vm.swap_total, that gives the total
>>     bytes available for swapping, and vm.swap_reserved, that gives number of
>>     bytes that may be needed to back all currently allocated anonymous
>>     memory.
>> 
>>     Setting bit 0 of the vm.overcommit sysctl causes the virtual memory
>>     system to return failure to the process when allocation of memory causes
>>     vm.swap_reserved to exceed vm.swap_total.  Bit 1 of the sysctl enforces
>>     RLIMIT_SWAP limit (see getrlimit(2)).  Root is exempt from this limit.
>>     Bit 2 allows to count most of the physical memory as allocatable, except
>>     wired and free reserved pages (accounted by vm.stats.vm.v_free_target and
>>     vm.stats.vm.v_wire_count sysctls, respectively).
>> 
>>> # 
>>> It's unclear if this implies yes or no, or even is the correct test.
> 
> I remain uncertain if overcommit is on or off 8-( It seems like overcommit
> limits are intended to keep one user from exhausting swap on a multiuser
> host. Not my situation, if that's the case.   

It has nothing to do with single-uaer vs. multi-user in
general but only in part (i.e, optionally). Also, some
of the bit mask values are not really about overcommit
but are about somewhat related limitations. More than
one of the options can be enabled at the same time
but normally none are enabled.

A description of overcommit: vm.swap_total < vm.swap_reserved

My notes below are in terms of applying a bit mask to pick out
a bit and have the result be != 0u (set) vs. == 0u (unset).

The mask 0x1u being set would lead to rejection of attempts
to have: vm.swap_total < vm.swap_reserved

You do not have 0x1u set so such rejection is not enabled.

The mask 0x2u involves:

getrlimit(RLIMIT_SWAP, struct rlimit* cur_and_max)

Quoting:

     RLIMIT_SWAP     The maximum size (in bytes) of the swap space that may be
                     reserved or used by all of this user id's processes.
                     This limit is enforced only if bit 1 of the vm.overcommit
                     sysctl is set.  Please see tuning(7) for a complete
                     description of this sysctl.

So this is a user's-processes-specific limit.

You do not have 0x2u set so RLIMIT_SWAP use is not enabled.
(Not really overcommit.)

The mask 0x4u set might lead to more RAM being considered as
allocatable.

You do not have 0x4u set so you have normal allocatable
physical memory classification in use. (Not really
overcommit.)

> It can only be said that it's probably whatever is default for -current.
> The sysctl command above was run as root, as is buildworld.

The default is always 0x0u: none of the 3 options being
enabled.


===
Mark Millard
marklmi at yahoo.com