bin/164526: kill(1) can not kill process despite on -KILL
Jilles Tjoelker
jilles at stack.nl
Wed Feb 1 23:50:09 UTC 2012
The following reply was made to PR bin/164526; it has been noted by GNATS.
From: Jilles Tjoelker <jilles at stack.nl>
To: =?utf-8?B?0JrQvtC90YzQutC+0LIg0JXQstCz0LXQvdC40Lk=?= <kes-kes at yandex.ru>
Cc: bug-followup at FreeBSD.org, freeradius-users at lists.freeradius.org,
firebird-devel at lists.sourceforge.net
Subject: Re: bin/164526: kill(1) can not kill process despite on -KILL
Date: Thu, 2 Feb 2012 00:46:47 +0100
On Thu, Feb 02, 2012 at 12:16:39AM +0200, ÐонÑков Ðвгений wrote:
> repeated again:
> bug is repeateable:
> 1. radiusd + mod_perl + example.pl(it is connects to FireBird) +
> FireBIrd
> 2. restart firebird
> 3. try to restart radiusd
> 4. process in fall into STOP state
> # ps awx | grep radi
> 9438 ?? TLs 5:10.12 /usr/local/sbin/radiusd
> 27603 2 S+ 0:00.00 grep radi
> # procstat -k 9438
> PID TID COMM TDNAME KSTACK
> 9438 100080 radiusd - mi_switch sleepq_switch sleepq_wait _sx_xlock_hard _sx_xlock _vm_map_lock_upgrade vm_map_lookup vm_fault_hold vm_fault trap_pfault trap calltrap
> 9438 100195 radiusd - mi_switch sleepq_switch sleepq_wait __lockmgr_args ffs_lock VOP_LOCK1_APV _vn_lock vm_object_deallocate unlock_and_deallocate vm_fault_hold vm_fault trap_pfault trap calltrap
> 9438 101144 radiusd - mi_switch thread_suspend_switch thread_single exit1 sigexit postsig ast doreti_ast
> # ps wHl9438
> UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
> 133 9438 1 0 20 0 351124 322000 user m TLs ?? 0:03.65 /usr/local/sbin/radiusd
> 133 9438 1 0 20 0 351124 322000 ufs TLs ?? 0:00.00 /usr/local/sbin/radiusd
> 133 9438 1 0 20 0 351124 322000 - TLs ?? 0:05.28 /usr/local/sbin/radiusd
> if I can supply another usefull debug info, answer as fast as you can, I can
> not wait too long. Thank you.
OK, this looks like it may be useful for someone who knows more about
the VM system than I do. It is very likely a FreeBSD kernel bug though,
so building freeradius and/or firebird with debug information is
unlikely to be useful (apart from perturbing a race condition, if the
problem is related to a race condition).
My analysis: thread 101144 is attempting to shut down the process in
response to a signal, but needs to wait for 100080 and 100195 to finish
page fault processing. For thread 100195, page fault processing resulted
in deallocating a VM object based on some sort of file, and it is
blocked waiting on the vnode lock for the file. It may or may not hold a
lock on a user map. Thread 100080 needs to lock a user map to continue
processing (this means the fault is either a copy-on-write fault or the
first write to anonymous memory). It seems that 100080 is not holding
the vnode lock that 100195 needs.
If you have DDB (kernel debugger) and witness compiled in, the DDB
command
show locks
will show who owns these locks. This is probably
The output of
procstat -kka
may be useful (like the previous procstat command but for all threads in
the system and with offsets from each function).
The output of
procstat -v 9438
is the memory mappings of the process. It could be that this command
gets stuck because of the locks.
--
Jilles Tjoelker
More information about the freebsd-bugs
mailing list