Help:: Listen queue overflow killing servers

Paul Macdonald paul at ifdnrg.com
Fri Jul 26 16:57:25 UTC 2019


On 26/07/2019 17:11, David Christensen wrote:
> On 7/26/19 4:58 AM, Paul Macdonald via freebsd-questions wrote:
>> Over the past few months i've seen several boxes (4 or 5) become 
>> unresponsive as a result of a Listen queue overflow state.
>
>> All are on ZFS and are std apache/php/mysql servers with nothing too 
>> exotic.
>
>> /var/log/messages shows tyically;
>>
>>      kernel: sonewconn: pcb 0xfffff813395e3d58: Listen queue 
>> overflow: 193 already in queue awaiting acceptance (83 occurrences)
>>
>> netstat -Lan  shows
>>
>> tcp4 193/0/128                          x.x.x.x.443
>> tcp4  193/0/128                          x.x.x.x.80
>
>
> What Apache/ PHP/ MySQL applications?  Did you write them?  If not, 
> who did?  Is everything up to date?  Have you filed bug reports?
>
>
> Do the applications have logging or debugging capabilities?  Have you 
> enabled them?  What do they say?  Where is the blockage? Deadlock?
>
>
These were on servers with multiple vhosts, often running wordpress , 
but in one instance not ( which had custom software we wrote inhouse , 
but thats been in production for 19 years without this issue!)

I suspect it's too low level for application level debugging,

all i know so far is:

                 - servers become unresponsive, Listen queue overflow 
messages in /var/log/messages

                 - unable to quit jails or even shutdown,  tcpdrop 
doesn't work (everything in CLOSE_WAIT)

                 - On the occasion today ( and i can;t be 100% sure, but 
i siuspect always) , all the apache processes were in disk wait state,  
but this was on a big new box, with a very tiny site, ( on NVMe)

                All servers on FBSD12, with zfs and apache is within an 
(ezjail)

                 Multiple load patterns, but 2 out of the 5ish issues 
don't make much sense as theere would have been very little load.

                 Non reproducible, have sieged a couple of the affected 
boxes with no effect ( and logs on a couple of boxes show no intersting 
traffic, just normal)

                         - siege -c 255 -r 2

                         (pretty stressful)

                     (target server does now something in netstat queues 
,  0-100/512  but apache stays out of disk wait , siege is (un) 
sucessfull as target copes fine

                 run multiple times , no problem, and have now generated 
about 100,000 lines more in apache log that i saw after the server went 
down today  ( (6600 hits to a 16C/32T  + 128GB + NVme machine went down 
with this earlier)

                I've just hit it with 255 concurrent users over a period 
of 20 mins, and it doesn;t blink

                 so doesn;t look like its load..... ( and that would 
have shown up in the logs anyway)



> David
> _______________________________________________
> freebsd-questions at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to 
> "freebsd-questions-unsubscribe at freebsd.org"
>
-- 
-------------------------
Paul Macdonald
IFDNRG Ltd
Web and video hosting
-------------------------
t: 0131 5548070
m: 07970339546
e: paul at ifdnrg.com
w: http://www.ifdnrg.com
-------------------------
IFDNRG
40 Maritime Street
Edinburgh
EH6 6SA
----------------------------------------------------

Virtual Servers from £50.00pm
High specification Dedicated Servers from £150.00pm

----------------------------------------------------



More information about the freebsd-questions mailing list