Stuck processes in unkillable (STOP) state, listen queue overflow
Zara Kanaeva
zara.kanaeva at ggi.uni-tuebingen.de
Tue Oct 27 14:02:19 UTC 2015
Hello,
I have the same experience with apache and mapserver. It happens on
physical machine and ends with spontaneous reboot. This machine is
updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps
this machine doesn't have enough RAM (only 8GB), but I think that must
not be a reason for a spontaneous reboot.
I had no such behavior with the same machine and FREEBSD 9.0 RELEASE
on it (I am not 100% sure, I have yet no possibility to test it).
Regards, Z. Kanaeva.
Zitat von "Nagy, Attila" <bra at fsn.hu>:
> Hi,
>
> Recently I've started to see a lot of cases, where the log is full
> with "listen queue overflow" messages and the process behind the
> network socket is unavailable.
> When I open a TCP to it, it opens but nothing happens (for example I
> get no SMTP banner from postfix, nor I get a log entry about the new
> connection).
>
> I've seen this with Java programs, postfix and redis, basically
> everything which opens a TCP and listens on the machine.
>
> For example, I have a redis process, which listens on 6381. When I
> telnet into it, the TCP opens, but the program doesn't respond.
> When I kill it, nothing happens. Even with kill -9 yields only this state:
> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAN
> 776 redis 2 20 0 24112K 2256K STOP 3 16:56
> 0.00% redis-
>
> When I tcpdrop the connections of the process, tcpdrop reports
> success for the first time and failure for the second (No such
> process), but the connections remain:
> # sockstat -4 | grep 776
> redis redis-serv 776 6 tcp4 *:6381 *:*
> redis redis-serv 776 9 tcp4 *:16381 *:*
> redis redis-serv 776 10 tcp4 127.0.0.1:16381 127.0.0.1:10460
> redis redis-serv 776 11 tcp4 127.0.0.1:16381 127.0.0.1:35795
> redis redis-serv 776 13 tcp4 127.0.0.1:30027 127.0.0.1:16379
> redis redis-serv 776 14 tcp4 127.0.0.1:58802 127.0.0.1:16384
> redis redis-serv 776 17 tcp4 127.0.0.1:16381 127.0.0.1:24354
> redis redis-serv 776 18 tcp4 127.0.0.1:16381 127.0.0.1:56999
> redis redis-serv 776 19 tcp4 127.0.0.1:16381 127.0.0.1:39488
> redis redis-serv 776 20 tcp4 127.0.0.1:6381 127.0.0.1:39491
> # sockstat -4 | grep 776 | awk '{print "tcpdrop "$6" "$7}' | /bin/sh
> tcpdrop: getaddrinfo: * port 6381: hostname nor servname provided,
> or not known
> tcpdrop: getaddrinfo: * port 16381: hostname nor servname provided,
> or not known
> tcpdrop: 127.0.0.1 16381 127.0.0.1 10460: No such process
> tcpdrop: 127.0.0.1 16381 127.0.0.1 35795: No such process
> tcpdrop: 127.0.0.1 30027 127.0.0.1 16379: No such process
> tcpdrop: 127.0.0.1 58802 127.0.0.1 16384: No such process
> tcpdrop: 127.0.0.1 16381 127.0.0.1 24354: No such process
> tcpdrop: 127.0.0.1 16381 127.0.0.1 56999: No such process
> tcpdrop: 127.0.0.1 16381 127.0.0.1 39488: No such process
> tcpdrop: 127.0.0.1 6381 127.0.0.1 39491: No such process
> # sockstat -4 | grep 776
> redis redis-serv 776 6 tcp4 *:6381 *:*
> redis redis-serv 776 9 tcp4 *:16381 *:*
> redis redis-serv 776 10 tcp4 127.0.0.1:16381 127.0.0.1:10460
> redis redis-serv 776 11 tcp4 127.0.0.1:16381 127.0.0.1:35795
> redis redis-serv 776 13 tcp4 127.0.0.1:30027 127.0.0.1:16379
> redis redis-serv 776 14 tcp4 127.0.0.1:58802 127.0.0.1:16384
> redis redis-serv 776 17 tcp4 127.0.0.1:16381 127.0.0.1:24354
> redis redis-serv 776 18 tcp4 127.0.0.1:16381 127.0.0.1:56999
> redis redis-serv 776 19 tcp4 127.0.0.1:16381 127.0.0.1:39488
> redis redis-serv 776 20 tcp4 127.0.0.1:6381 127.0.0.1:39491
>
> $ procstat -k 776
> PID TID COMM TDNAME KSTACK
> 776 100725 redis-server - mi_switch
> sleepq_timedwait_sig _sleep kern_kevent sys_kevent amd64_syscall
> Xfast_syscall
> 776 100744 redis-server - mi_switch
> thread_suspend_switch thread_single exit1 sigexit postsig ast
> doreti_ast
>
> I can do nothing to get out from this state, only reboot helps.
>
> The OS is stable/10 at r289313, but I could observe this behaviour with
> earlier releases too.
>
> The dmesg is full with lines like these:
> sonewconn: pcb 0xfffff8004dc54498: Listen queue overflow: 193
> already in queue awaiting acceptance (3142 occurrences)
> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
> already in queue awaiting acceptance (3068 occurrences)
> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
> already in queue awaiting acceptance (3057 occurrences)
> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
> already in queue awaiting acceptance (3037 occurrences)
> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
> already in queue awaiting acceptance (3015 occurrences)
> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193
> already in queue awaiting acceptance (3035 occurrences)
>
> I guess this is the effect of the process freeze, not the cause (the
> listen queue fills up because the app can't handle the incoming
> connections).
>
> I'm not sure it matters, but some of the machines (and the above)
> runs on an ESX hypervisor (but as far as I can remember, I could see
> this on physical machines too, but I'm not sure about that).
> Also -so far- I could only see this where some "exotic" stuff ran,
> like a java or erlang based server (opendj, elasticsearch and
> rabbitmq).
>
> Also not sure about which triggers this. I've never seen this after
> some hours of uptime, at least some days or a week must've been
> passed to get stuck like the above.
>
> Any ideas about this?
>
> Thanks,
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
--
Dipl.-Inf. Zara Kanaeva
Heidelberger Akademie der Wissenschaften
Forschungsstelle "The role of culture in early expansions of humans"
an der Universität Tübingen
Geographisches Institut
Universität Tübingen
Ruemelinstr. 19-23
72070 Tuebingen
Tel.: +49-(0)7071-2972132
e-mail: zara.kanaeva at geographie.uni-tuebingen.de
-------
- Theory is when you know something but it doesn't work.
- Practice is when something works but you don't know why.
- Usually we combine theory and practice:
Nothing works and we don't know why.
More information about the freebsd-stable
mailing list