11.0 stuck on high network load
Julien Charbon
julien.charbon at gmail.com
Mon Sep 26 09:33:31 UTC 2016
Hi Slawa,
On 9/25/16 2:46 PM, Slawa Olhovchenkov wrote:
> On Fri, Sep 23, 2016 at 11:01:43PM +0300, Slawa Olhovchenkov wrote:
>>> On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote:
>>>> On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote:
>>>>> You can also use Dtrace and lockstat (especially with the lockstat -s
>>>>> option):
>>>>>
>>>>> https://wiki.freebsd.org/DTrace/One-Liners#Kernel_Locks
>>>>> https://www.freebsd.org/cgi/man.cgi?query=lockstat&manpath=FreeBSD+11.0-RELEASE
>>>>>
>>>>> But I am less familiar with Dtrace/lockstat tools.
>>>>
>>>> I am still use old kernel and got lockdown again.
>>>> Try using lockstat (I am save more output), interesting may be next:
>>>>
>>>> R/W writer spin on writer: 190019 events in 1.070 seconds (177571 events/sec)
>>>>
>>>> -------------------------------------------------------------------------------
>>>> Count indv cuml rcnt nsec Lock Caller
>>>> 140839 74% 74% 0.00 24659 tcpinp tcp_tw_2msl_scan+0xc6
>>>>
>>>> nsec ------ Time Distribution ------ count Stack
>>>> 4096 | 913 tcp_twstart+0xa3
>>>> 8192 |@@@@@@@@@@@@ 58191 tcp_do_segment+0x201f
>>>> 16384 |@@@@@@ 29594 tcp_input+0xe1c
>>>> 32768 |@@@@ 23447 ip_input+0x15f
>>>> 65536 |@@@ 16197
>>>> 131072 |@ 8674
>>>> 262144 | 3358
>>>> 524288 | 456
>>>> 1048576 | 9
>>>> -------------------------------------------------------------------------------
>>>> Count indv cuml rcnt nsec Lock Caller
>>>> 49180 26% 100% 0.00 15929 tcpinp tcp_tw_2msl_scan+0xc6
>>>>
>>>> nsec ------ Time Distribution ------ count Stack
>>>> 4096 | 157 pfslowtimo+0x54
>>>> 8192 |@@@@@@@@@@@@@@@ 24796 softclock_call_cc+0x179
>>>> 16384 |@@@@@@ 11223 softclock+0x44
>>>> 32768 |@@@@ 7426 intr_event_execute_handlers+0x95
>>>> 65536 |@@ 3918
>>>> 131072 | 1363
>>>> 262144 | 278
>>>> 524288 | 19
>>>> -------------------------------------------------------------------------------
>>>
>>> This is interesting, it seems that you have two call paths competing
>>> for INP locks here:
>>>
>>> - pfslowtimo()/tcp_tw_2msl_scan(reuse=0) and
>>>
>>> - tcp_input()/tcp_twstart()/tcp_tw_2msl_scan(reuse=1)
>>
>> My current hypothesis:
>>
>> nginx do write() (or may be close()?) to socket, kernel lock
>> first inp in V_twq_2msl, happen callout for pfslowtimo() on the same
>> CPU core and tcp_tw_2msl_scan infinity locked on same inp.
>>
>> In this case you modification can't help, before next try we need some
>> like yeld().
>
> Or may be locks leaks.
> Or both.
You are totally right, pfslowtimo()/tcp_tw_2msl_scan(reuse=0) is
infinitely blocked on INP_WLOCK() by "something" (that could be related
to write()).
As I reached my limit of debugging without WITNESS, could you share
your /etc/sysctl.conf, /boot/loader.conf files? And any specific
configuration you have (like having a Nginx workers affinity, Nginx
special options, etc.). Like that I can try to reproduce it on releng/11.0.
Thanks.
--
Julien
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20160926/33c6123a/attachment-0001.sig>
More information about the freebsd-stable
mailing list