Re: Big compat issue with a recent current (zfs + syscall)

From: Kyle Evans <kevans_at_FreeBSD.org>
Date: Thu, 21 Aug 2025 21:56:57 UTC
On 8/21/25 16:27, Alexander Leidinger wrote:
> Am 2025-08-21 23:14, schrieb Kyle Evans:
>> On 8/21/25 11:16, Xin LI wrote:
>>> On Thu, Aug 21, 2025 at 6:19 AM Alexander Leidinger <Alexander@leidinger.net <mailto:Alexander@leidinger.net>> wrote:
>>>
>>>     Hi,
>>>
>>>     I tried to update from -current as of 2025-08-11-154054 (CEST) to
>>>     2025-08-20-075320. I've updated the kernel and base, but no jail.
>>>     Result:
>>>
>>>     1) Not all jails got up. For mysql I have the DB on a separate dataset,
>>>     I attach the dataset to the jail (grep dataset /etc/rc.d/jail) to be
>>>     able to manage it from within the jail. I got the message that the
>>>     dataset is already attached (first start of the jail at boot). When I
>>>     resolved this by simply not attaching the dataset, mysqld died with a
>>>     bad syscall.
>>>
>>>     2) A lot of processes inside jails segfaulted.
>>>
>>>     I then updated the jails from the build. More processes came up, but
>>>     some still died (e.g. php_fpm).
>>>
>>>     At that point I reverted all back (this emails is handled via the jails
>>>     on this host). I still have the BE which causes issues, in case someone
>>>     needs to get some info out of it.
>>>
>>>     I have not seen anything in UPDATING which suggests anything in this
>>>     regard.
>>>
>>>     src.conf:
>>>     ---snip---
>>>     WITHOUT_PROFILE=yes
>>>     CFLAGS+=-DFTP_COMBINE_CWDS
>>>     MALLOC_PRODUCTION=yes
>>>     WITH_MALLOC_PRODUCTION=yes
>>>     WITHOUT_LLVM_ASSERTIONS=yes
>>>     KERNCONF=ANDROMEDA
>>>     WITH_RETPOLINE=yes
>>>     WITH_KERNEL_RETPOLINE=yes
>>>     WITH_RELRO=yes
>>>     WITH_BIND_NOW=yes
>>>     OPT_INIT_ALL=zero
>>>     WITH_ZEROREGS=yes
>>>     WITHOUT_CLEAN=yes
>>>     LOADER_GZIP_SUPPORT=no
>>>     LOADER_BZIP2_SUPPORT=no
>>>     LOADER_BIOS_TEXTONLY=no
>>>     LOADER_NFS_SUPPORT=no
>>>     LOADER_TFTP_SUPPORT=no
>>>     LOADER_CD9660_SUPPORT=no
>>>     ---snip---
>>>
>>>     src-env.conf:
>>>     ---snip---
>>>     WITH_META_MODE=yes
>>>     FORTIFY_SOURCE=2
>>>     ---snip---
>>>
>>>     Bye,
>>>     Alexander.
>>>
>>>     --     http://www.Leidinger.net <http://www.Leidinger.net> Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
>>>     http://www.FreeBSD.org <http://www.FreeBSD.org> netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF
>>>
>>> Since you are using a custom kernel, is it possible that you didn't have COMPAT_FREEBSD14?  (Recently [gs]etgroups were changed, with compatibility syscalls moved to COMPAT_FREEBSD14).
> 
> UPDATING only mentions VMM stuff for COMPAT_FREEBSD14. I give this a try tomorrow. But would this also affect the zfs dataset stuff?
> 

Reading your initial e-mail, I think you actually got past the main reason to need COMPAT_FREEBSD14 when you rebuilt
after mysqld died with a bad syscall.  I had missed that detail the first time, sorry.

I don't think I'd expect it to help with zfs dataset stuff.

>> I had wondered the same, but the use of 'segfault' gave me pause; these would be SIGSYS rather than SIGSEGV, but that could just be a minor terminology dispute.
> 
> Aug 20 10:35:32 Andromeda kernel: [566445] pid 52166 (auth), jid 50, uid 143: exited on signal 6 (no core dump - sugid process denied by ke
> rn.sugid_coredump)
> Aug 20 10:35:37 Andromeda kernel: [566450] pid 52172 (auth), jid 50, uid 143: exited on signal 6 (no core dump - sugid process denied by ke
> rn.sugid_coredump)
> Aug 20 10:35:44 Andromeda kernel: [566457] pid 52179 (auth), jid 50, uid 143: exited on signal 6 (no core dump - sugid process denied by ke
> rn.sugid_coredump)
> Aug 20 10:35:51 Andromeda kernel: [566463] pid 52185 (auth), jid 50, uid 143: exited on signal 6 (no core dump - sugid process denied by ke
> rn.sugid_coredump)
> Aug 20 10:35:56 Andromeda kernel: [566469] pid 52193 (auth), jid 50, uid 143: exited on signal 6 (no core dump - sugid process denied by ke
> rn.sugid_coredump)
> 

SIGABRT would seem to imply something like an assertion being tripped, which is a bit unusual.  Might need to flip
kern.sugid_coredump for a minute and see if you can gather some more context from a coredump.