[Bug 290843] killpg deadlock against a stopped interrupted fork
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 290843] killpg deadlock against a stopped interrupted fork"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 290843] killpg deadlock against a stopped interrupted fork"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 290843] killpg deadlock against a stopped interrupted fork"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 290843] killpg deadlock against a stopped interrupted fork"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 06 Nov 2025 19:09:03 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=290843
Bug ID: 290843
Summary: killpg deadlock against a stopped interrupted fork
Product: Base System
Version: CURRENT
Hardware: Any
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: bugs@FreeBSD.org
Reporter: bdrewery@FreeBSD.org
CC: kib@FreeBSD.org, markj@FreeBSD.org
This is on CURRENT 55c28005f544282b984ae0e15dacd0c108d8ab12 but I've seen this
for a few years. I had to disable the DEADLKRES option because I hit it so
often in Poudriere tests. Finally found a simple repro today.
Basic summary is `killpg(pgid, STOP)` against a forking child blocks further
`killpg(pgid)`.
Repro given later. Here's the simplest result:
```
# procstat -t 51155 31783
PID TID COMM TDNAME CPU PRI STATE WCHAN
51155 302071 sh - -1 115 sleep killpg r
31783 128716 sh - -1 115 stop -
# procstat -kk 51155 31783
PID TID COMM TDNAME KSTACK
51155 302071 sh - mi_switch+0x172
sleepq_switch+0x109 _sx_xlock_hard+0x513 _sx_xlock+0xac killpg1+0x138
kern_kill+0x222 amd64_syscall+0x451 fast_syscall_common+0xf8
31783 128716 sh - mi_switch+0x172
thread_suspend_check+0xbd sig_intr+0x7a fork1+0x448 sys_fork+0x54
amd64_syscall+0x451 fast_syscall_common+0xf8
```
Using `kill -CONT -31783` blocks on killpg racer, while avoiding killpg with
`kill -CONT 31783` does not block.
Repro:
```
# `kill -STOP; kill -TERM; kill-CONT` against a forking job (job control
enabled).
# foo() is trying to repro a blank $() value which is not required for the
repro but brings in enough forking to trigger the problem quickly so I left it
in.
sh -c 'trap "kill -9 %1; exit" INT; foo() { unset cmd; cmd=$(/sbin/sysctl -n
vm.loadavg|/usr/bin/awk "{print \$2,\$3,\$4}"); case "${cmd:+set}" in set) ;;
*) exit 99 ;; esac }; runner() { while foo; do :; done }; launch() { local -;
set -m; PS4="child+ " runner & }; set -x; while :; do launch; sleep 0.1; kill
-STOP %1; kill -TERM %1; kill -CONT %1; ret=0; wait; if [ $ret -eq 99 ]; then
exit 99; fi; done;'
```
It appears https://reviews.freebsd.org/D40493 and
https://reviews.freebsd.org/D41128 may have relevant discussion and attempts to
fix.
--
You are receiving this mail because:
You are the assignee for the bug.