[Bug 289394] sysutils/slurm-wlm: fix slurmd crash when using task/pgid plugin
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 289394] sysutils/slurm-wlm: fix slurmd crash when using task/pgid plugin"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 289394] sysutils/slurm-wlm: fix slurmd crash when using task/pgid plugin"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 289394] sysutils/slurm-wlm: fix slurmd crash when using task/pgid plugin"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 289394] sysutils/slurm-wlm: fix slurmd crash when using task/pgid plugin"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 289394] sysutils/slurm-wlm: fix slurmd crash when using task/pgid plugin"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 289394] sysutils/slurm-wlm: fix slurmd crash when using task/pgid plugin"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 289394] sysutils/slurm-wlm: fix slurmd crash when using task/pgid plugin"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 08 Sep 2025 23:41:01 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=289394
Bug ID: 289394
Summary: sysutils/slurm-wlm: fix slurmd crash when using
task/pgid plugin
Product: Ports & Packages
Version: Latest
Hardware: Any
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: Individual Port(s)
Assignee: ports-bugs@FreeBSD.org
Reporter: rikka.goering@outlook.de
The task/pgid plugin as currently added dereferences step->pgid from
stepd_step_rec_t. On FreeBSD/Slurm 23.11 this field is not initialized, causing
slurmd to segfault when launching tasks (REQUEST_LAUNCH_TASKS).
Reproduction:
-Configure TaskPlugin=pgid in slurm.conf.
- Start slurmd -Dvvv.
- Submit a trivial job:
srun -N1 -n1 /bin/echo $hostname
- Result: slurmd segfaults immediately on the task launch RPC.
Fix:
- Rework task_p_pre_launch() and task_p_signal() to manage a cached PGID inside
the plugin itself.
- No longer touch step->pgid.
- Each step process creates/joins its own process group with setpgid(0,0) and
caches the result in pgid_cached.
- When Slurm signals the task, the plugin calls killpg(pgid_cached, sig).
- This avoids dereferencing uninitialized data structures and removes the
crash.
Changes:
- Patch src/plugins/task/pgid/task_pgid.c:
- Replace old logic referencing step->pgid.
- Add static pid_t pgid_cached.
- Implement safe group creation/join + signal forwarding.
--
You are receiving this mail because:
You are the assignee for the bug.