unresponsive process issue
travis.parker at gmail.com
Thu Mar 10 01:32:56 UTC 2016
I've twice now had a process get into a stuck state that I don't believe
should be possible: stopped (ps reports 'T', top reports 'STOP'), but
unresponsive to any signal, including even CONT (KILL followed by CONT
isn't clearing it). It's a redis process, in a jail, that is listening on a
streaming unix domain socket.
This came to my attention both times because it stops accepting connections
on the socket. Also being unresponsive to signals I'm left with no way to
interact with it (or get rid of it). truss(1) doesn't see any activity and
I couldn't get a backtrace from gdb(1) although there's probably more
information to be gleaned with better gdb-foo than mine.
It has some 200 connection fds sitting around, but it's configured to
accept up to 10000.
For the moment I have switched it over to TCP on localhost and I'll have to
wait and see if that works around whatever got it into this state (it takes
a few days to occur).
I wanted to reach out to this list in case there's something obvious I'm
missing before mailing freebsd-bugs. I've found a few descriptions of a
similar issue (STOP state unresponsive to CONT) from googling, but only
back around 2004 and always resolved.
Thanks in advance for any help,
More information about the freebsd-questions