Killing processes from DDB

Thu Aug 30 08:12:28 UTC 2012

On Thu, Aug 30, 2012 at 07:43:46AM +0100, Matt Burke wrote:
> Is it possible to forcibly kill process from DDB which are unkillable from
> userland? My understanding is the 'kill' command is effectively the same as
> the userland version, so perhaps a process could be terminated by invoking
> an OOM handler or something?
Processes can only be terminated at the safe points, where kernel code
explicitely checks for termination conditions and which are known to
not hold kernel resources.

Yes, kill command from ddb just kills the process, i.e. it sends a signal
to it, handling of which is subject of the normal signal delivering.

> 
> 
> I just had a VirtualBox instance crash and hog 100% CPU on my desktop:
> 
> mattb      36939 100.0 13.6 2577328 2276108 ??  I     6:13AM    2:28.44
> /usr/local/lib/virtualbox/VirtualBox
> 
> I kill -9 it
> 
> mattb      36939 100.0 13.6 2577328 2275804 ??  T     6:13AM    3:10.89
> /usr/local/lib/virtualbox/VirtualBox
> 
> Note it's moved to 'stop' state for some reason, yet is still eating 100%
> cpu time
> 
> # procstat -k 36939
>   PID    TID COMM             TDNAME           KSTACK
> 36939 227509 VirtualBox       -                <running>
> 36939 227836 VirtualBox       -                mi_switch
> thread_suspend_switch thread_single exit1 sigexit postsig ast doreti_ast
Stop state indicates that the process is stopped or being stopped. The later
is your case. The process has one thread executing exit1() kernel function,
which terminates the process. In the course of work, the function notifies
all other threads of the exiting process that they shall terminate ASAP at
the next safe point.

According to the procstat output, there is other thread in the process which
seems to execute in kernel. My guess is that it loops somewhere, not reaching
any check-points for termination.

> 
> 
> Could this be the trigger - 9.0 binary (from pkgng) against 9.1?
> 
> $ procstat -b 1 36939
>   PID COMM                OSREL PATH
>     1 init               901000 /sbin/init
> 36939 VirtualBox         900044 /usr/local/lib/virtualbox/VirtualBox
> 
> 
> I couldn't even kill it with "dtrace -n 'pid$target:::' -p 36939 -l" -
> which so far has proven reliable in killing anything:
> 
> # dtrace -n 'pid$target:::' -p 2021 -l    <--- unimportant proc
> Bus error: 10 (core dumped)
> # dtrace -n 'pid$target:::' -p 2044 -l    <--- unimportant proc
> Bus error: 10 (core dumped)
> # dtrace -n 'pid$target:::' -p 36939 -l   <--- virtualbox hangs dtrace
> ^C
> 
> I couldn't truss the process or use gcore to get a dump, so my only option
> was a reboot.  Does anyone have any suggestions on a course of action in
> case this happens again? I can't get a kernel dump since the machine
> doesn't have enough swap (small SSDs)

The way to debug the issue is to break into ddb on console and get
a backtrace for the spinning thread, then continue, then break again
and get another backtrace. Do it several times, to see where the code
spins.

It is impossible to even start guessing what is wrong, without seeing
the backtrace.

Still, recompiling VB could be good idea, since VB kernel module uses
non-stable KPI and KBI, thus what you see might be just build issue.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20120830/04577730/attachment.pgp