Help:: Listen queue overflow killing servers

Paul Macdonald paul at ifdnrg.com
Fri Jul 26 19:34:49 UTC 2019


On 26/07/2019 19:56, David Christensen wrote:
> On 7/26/19 9:57 AM, Paul Macdonald via freebsd-questions wrote:
>>
>> On 26/07/2019 17:11, David Christensen wrote:
>>> On 7/26/19 4:58 AM, Paul Macdonald via freebsd-questions wrote:
>>>> Over the past few months i've seen several boxes (4 or 5) become 
>>>> unresponsive as a result of a Listen queue overflow state.
>>>
>>                  so doesn;t look like its load..... ( and that would 
>> have shown up in the logs anyway)
>
>
> Is this server in production?  If so, it would be prudent to migrate 
> services and data to another computer while you troubleshoot.
>
>
this has happened on 5 production boxes over the past few months, all 
with different hardware and load profiles.



> I would turn on debugging and crank up logging everywhere -- kernel, 
> ZFS, Apache, MySQL, PHP, WP, app code, etc..  Make sure you have a big 
> and fast device/ virtual device for the logs and debug dumps.
>
>
thats  a big job  we run 110+ servers, i'd like to find something more 
specific


> Are the stress tests hitting the server with "good" traffic?  Can you 
> send "bad" traffic?
>
>
no idea how to send bad traffic!


> Do you have test suites for any of the components?  If so, run them. 
> As you troubleshoot, write new test scripts.
>
components are not comparable across boxes, and one box that went down 
has only our custom code ( which has worked for a decade)
>
> Can you capture real traffic and replay it -- preferably traffic that 
> elicits the bug(s)?
>
the issue doesn;t seem to be that reproducible, i'l check but i think 
only 1 of the boxes has gone down >1 times with same issue

(i can't capture traffic on all boxes)

I wish it was more reproducible, i'd downgrade that server down to 11.4 
in a heart beat ( i'm suspecting its 12.0 related)

( have see historic report of similar issues on imap boxes, which do 
have large quues anyway obv)

weirdly our imap boxes have been fine, and they have 10k connections all 
the time.

I sieged tested the box that went down earlier today (16C/32T, 128GB 
RAM, 1Tb NVme) and it didn;t break sweat after 300,000 conections.

am at a bit of a loss.



>
> David
> _______________________________________________
> freebsd-questions at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to 
> "freebsd-questions-unsubscribe at freebsd.org"
>
-- 
-------------------------
Paul Macdonald
IFDNRG Ltd
Web and video hosting
-------------------------
t: 0131 5548070
m: 07970339546
e: paul at ifdnrg.com
w: http://www.ifdnrg.com
-------------------------
IFDNRG
40 Maritime Street
Edinburgh
EH6 6SA
----------------------------------------------------

Virtual Servers from £50.00pm
High specification Dedicated Servers from £150.00pm

----------------------------------------------------



More information about the freebsd-questions mailing list