Re: port binary dumping core on recent head in poudriere

From: Mark Millard <marklmi_at_yahoo.com>
Date: Mon, 25 Nov 2024 23:21:20 UTC
On Nov 25, 2024, at 14:23, Guido Falsi <mad@madpilot.net> wrote:

> On 25/11/24 23:15, Dimitry Andric wrote:
>> On 25 Nov 2024, at 23:12, Mark Millard <marklmi@yahoo.com> wrote:
>>> 
>>> On Nov 25, 2024, at 13:27, Guido Falsi <mad@madpilot.net> wrote:
>>> 
>>>> On 25/11/24 22:18, Dag-Erling Smørgrav wrote:
>>>>> Mark Millard <marklmi@yahoo.com> writes:
>>>>>> Guido Falsi <mad@madpilot.net> writes:
>>>>>>> On 25/11/24 09:17, Dag-Erling Smørgrav wrote:
>>>>>>>> Dimitry Andric <dim@FreeBSD.org> writes:
>>>>>>>>> Probably best to create a bugzilla ticket, but as I said before, I
>>>>>>>>> cannot reproduce this.
>>>>>>>> I can.  My builder is running 15 and sees segfaults while building
>>>>>>>> packages for 14 and 15 but not for 13.
>>>>>>> BTW removing optimizations (CPUTYPE) for only the affected ports made
>>>>>>> guile2 work again. Did not solve the issue with sassc though.  [...]
>>>>>>> I'm also using ccache, but that does not look relevant.
>>>>>> I've never used ccache or analogous and get the libsass.so.1.0.0
>>>>>> .got.plt corruption that I've reported on the lists anyway.
>>>>> I don't use ccache or optimizations.  Here's an example of sassc
>>>>> segfaulting in a 14.1-RELEASE-p6 jail:
>>>>>  https://pkg.des.dev/logs/data/14amd64-default/2024-11-24_19h29m04s/logs/errors/plasma5-breeze-gtk-5.27.11.log
>>>>> which matches the following entry from `/var/log/messages`:
>>>>>  Nov 24 21:23:06 pkg kernel: pid 71277 (sassc), jid 253, uid 65534: exited on signal 11 (core dumped)
>>>>> The poudriere host is a bhyve VM with 48 cores and 192 GB RAM on a
>>>>> 32c/64t AMD EPYC 7502P with 256 GB RAM.
>>>> 
>>>> I sincerely hope this is not relevant but my CPU is also AMD: AMD Ryzen 5 5600G
>>> 
>>> The amd64 system type that I have access to and used
>>> for my testing:
>>> 
>>> AMD 7950X3D (16 core, 32 thread, so 32 FreeBSD-cpus) with 192 GiBytes of RAM
>> I'm on Intel, and I don't see any crashes at all. So, are we looking at some CPU specific issue here?
> 
> We can't say for sure, but we definitely have all people reporting the issue on the same CPU brand, so it's some indication I guess.
> 
> I was hoping it would not come to this because I suspect such issues are quite difficult to diagnose.

Unfortunately, for amd64 I only have access to:

) An old ThreadRipper 1950X system (untested so far)
) The 7950X3D system

No Intel systems.

If someone had both AMD and Intel and could have
boot&operate media that should work for both, say
USB that can be simply moved between machines,
running test on both would be appropriate.
(Implication: the media not being tailored to the
cpu specifics so the same system software is
tested in both places.)

I'll note that the media in my context is PCIe Optane,
ZFS based. I could try a U.2 Optane in a PCIe adaptor
that has UFS instead for building textproc/libsass .
(The U.2 content is an basically a rsync of the ZFS
Optane media's live directory tree, with node naming
and such adjusted afterwards.)

What do other folks have for the file system(s)
involved?

===
Mark Millard
marklmi at yahoo.com