[Bug 289394] sysutils/slurm-wlm: fix slurmd crash when using task/pgid plugin

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 08 Sep 2025 23:41:01 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=289394

            Bug ID: 289394
           Summary: sysutils/slurm-wlm: fix slurmd crash when using
                    task/pgid plugin
           Product: Ports & Packages
           Version: Latest
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: Individual Port(s)
          Assignee: ports-bugs@FreeBSD.org
          Reporter: rikka.goering@outlook.de

The task/pgid plugin as currently added dereferences step->pgid from
stepd_step_rec_t. On FreeBSD/Slurm 23.11 this field is not initialized, causing
slurmd to segfault when launching tasks (REQUEST_LAUNCH_TASKS).


Reproduction:

-Configure TaskPlugin=pgid in slurm.conf.
- Start slurmd -Dvvv.
- Submit a trivial job:
  srun -N1 -n1 /bin/echo $hostname
- Result: slurmd segfaults immediately on the task launch RPC.


Fix: 

- Rework task_p_pre_launch() and task_p_signal() to manage a cached PGID inside
the plugin itself.
- No longer touch step->pgid.
- Each step process creates/joins its own process group with setpgid(0,0) and
caches the result in pgid_cached.
- When Slurm signals the task, the plugin calls killpg(pgid_cached, sig).
- This avoids dereferencing uninitialized data structures and removes the
crash.


Changes:

- Patch src/plugins/task/pgid/task_pgid.c:
  - Replace old logic referencing step->pgid.
  - Add static pid_t pgid_cached.
  - Implement safe group creation/join + signal forwarding.

-- 
You are receiving this mail because:
You are the assignee for the bug.