mpd has hung
Bjoern A. Zeeb
bzeeb-lists at lists.zabbadoz.net
Sat Feb 20 12:05:07 UTC 2010
On Sat, 20 Feb 2010, Bjoern A. Zeeb wrote:
> On Fri, 19 Feb 2010, Mikolaj Golub wrote:
>> On Thu, 18 Feb 2010 17:32:37 +0200 Nikos Vassiliadis wrote:
>>> On 2/17/2010 3:26 PM, Alexander Shikoff wrote:
>>>> Hello All,
>>>> I have mpd 5.3 running on 8.0-RC1 as PPPoE server (now only 5 clients).
>>>> Today mpd process hung and I cannot kill it with -9 signal, and I cannot
>>>> access it's console via telnet.
>>>> State of process in `top` output is STOP:
>>>> 73551 root 2 44 0 29588K 5692K STOP 6 0:32 0.00% mpd5
>>>> # procstat -kk 73551
>>>> PID TID COMM TDNAME KSTACK
>>>> 73551 100233 mpd5 - mi_switch+0x16f
>>>> sleepq_wait+0x42 _cv_wait+0x111 flowtable_flush+0x51 if_detach+0x2f2
>>>> ng_iface_shutdown+0x1e ng_rmnode+0x167 ng_apply_item+0xef7
>>>> ng_snd_item+0x2ce ngc_send+0x1d2 sosend_generic+0x3f6 kern_sendit+0x13d
>>>> sendit+0xdc sendto+0x4d syscall+0x1da Xfast_syscall+0xe1
>>>> 73551 100502 mpd5 - mi_switch+0x16f
>>>> thread_suspend_switch+0xc6 thread_single+0x1b6 exit1+0x72 sigexit+0x7c
>>>> postsig+0x306 ast+0x279 doreti_ast+0x1f
>>>> Is there a way to stop a process without rebooting a whole system?
>>>> Thanks in advance!
>>>> P.S. I'm ready for experiments with it before tonight, but I cannot
>>>> force system to crash in order to get crash dump right now.
>>> It's probably too late now, but are you sure that nobody pressed
>>> CTLR-Z while in the mpd console???
>>> CTLR-Z will send SIGSTOP to the process and the process will
>>> stop. While stopped, all processing stops(including receiving
>>> SIGKILL, you cannot kill it, and the signals are queued). You
>>> have to send SIGCONT for the process to continue.
>> We were discussing this problem with Alexander in another
>> speaking) maillist. And it looks like the problem is the following.
>> mpd5 thread was detaching ng interface and when doing flowtable_flush() it
>> slept in cv_wait waiting for flowclean_cycles variable to be updated. It
>> should have been awaken by flowcleaner thread but this thread got stuck in
>> endless loop, supposedly in flowtable_clean_vnet()/flowtable_free_stale(),
>> think because of inconsistent state of some lists (iface?) due to if_detach
>> being in progress.
> I have patches that are out for review.
I am not sure if they apply cleanly as they are broken out of the tail
side of a larger patchset.
If you are not using VIMAGEs you could ignore the ones I marked with (*).
If you are still seeing the hang and have DDB support in your kernel,
then break into the debugger and save the complete output of
Bjoern A. Zeeb It will not break if you know what you are doing.
More information about the freebsd-net