[Bug 288983] sysutils/slurm-wlm: slurmd and slurmstepd crash due to missing sockaddr length handling in bind() / connect()
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 288983] sysutils/slurm-wlm: slurmd and slurmstepd crash due to missing sockaddr length handling in bind() / connect()"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 288983] sysutils/slurm-wlm: slurmd and slurmstepd crash due to missing sockaddr length handling in bind() / connect()"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 288983] sysutils/slurm-wlm: slurmd and slurmstepd crash due to missing sockaddr length handling in bind() / connect()"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 288983] sysutils/slurm-wlm: slurmd and slurmstepd crash due to missing sockaddr length handling in bind() / connect()"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 288983] sysutils/slurm-wlm: slurmd and slurmstepd crash due to missing sockaddr length handling in bind() / connect()"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 288983] sysutils/slurm-wlm: slurmd and slurmstepd crash due to missing sockaddr length handling in bind() / connect()"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 288983] sysutils/slurm-wlm: slurmd and slurmstepd crash due to missing sockaddr length handling in bind() / connect()"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 288983] sysutils/slurm-wlm: slurmd and slurmstepd crash due to missing sockaddr length handling in bind() / connect()"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 288983] sysutils/slurm-wlm: slurmd and slurmstepd crash due to missing sockaddr length handling in bind() / connect()"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 288983] sysutils/slurm-wlm: slurmd and slurmstepd crash due to missing sockaddr length handling in bind() / connect()"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 288983] sysutils/slurm-wlm: slurmd and slurmstepd crash due to missing sockaddr length handling in bind() / connect()"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 20 Aug 2025 23:25:34 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=288983
Bug ID: 288983
Summary: sysutils/slurm-wlm: slurmd and slurmstepd crash due to
missing sockaddr length handling in bind() / connect()
Product: Ports & Packages
Version: Latest
Hardware: amd64
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: Individual Port(s)
Assignee: ports-bugs@FreeBSD.org
Reporter: rikka.goering@outlook.de
When applying the patches that solve bug #288617, #288668, and #288880, both
slurmctld and slurmd start successfully and initially connect. However, after
some time the daemons lose connection. Submitting tasks via srun fails, and
slurmd eventually crashes with a segmentation fault.
The root cause appears to be that several bind() and connect() calls do not set
the sockaddr length (sun_len, sin_len, sin6_len) correctly on FreeBSD. Without
this, sockets are initialized improperly and result in runtime errors.
How to reproduce:
srun -N1 -w Torch -t1 /bin/hostname
Actual result:
srun: error: unable to initialize step launch listening socket: Invalid
argument
srun: Required node not available (down, drained or reserved)
srun: job 3 queued and waiting for resources
slurmd eventually segfaults.
Expected result:
Command runs successfully and prints the hostname of the worker node (here:
Torch).
Workaround:
No known workaround exists, except manually fixing the sockaddr length fields
(sun_len, sin_len, sin6_len) and passing them to bind() / connect().
I am currently preparing patches for this and will upload a unified git diff
once they are finished and tested.
--
You are receiving this mail because:
You are the assignee for the bug.