11.0 stuck on high network load
Julien Charbon
jch at freebsd.org
Mon Sep 26 11:57:13 UTC 2016
Hi Slawa,
On 9/25/16 2:46 PM, Slawa Olhovchenkov wrote:
> On Fri, Sep 23, 2016 at 11:01:43PM +0300, Slawa Olhovchenkov wrote:
>> On Wed, Sep 21, 2016 at 11:25:18PM +0200, Julien Charbon wrote:
>>>
>>> On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote:
>>>> On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote:
>>>>> You can also use Dtrace and lockstat (especially with the lockstat -s
>>>>> option):
>>>>>
>>>>> https://wiki.freebsd.org/DTrace/One-Liners#Kernel_Locks
>>>>> https://www.freebsd.org/cgi/man.cgi?query=lockstat&manpath=FreeBSD+11.0-RELEASE
>>>>>
>>>>> But I am less familiar with Dtrace/lockstat tools.
>>>>
>>>> I am still use old kernel and got lockdown again.
>>>> Try using lockstat (I am save more output), interesting may be next:
>>>>
>>>> R/W writer spin on writer: 190019 events in 1.070 seconds (177571 events/sec)
>>>>
>>>> -------------------------------------------------------------------------------
>>>> Count indv cuml rcnt nsec Lock Caller
>>>> 140839 74% 74% 0.00 24659 tcpinp tcp_tw_2msl_scan+0xc6
>>>>
>>>> nsec ------ Time Distribution ------ count Stack
>>>> 4096 | 913 tcp_twstart+0xa3
>>>> 8192 |@@@@@@@@@@@@ 58191 tcp_do_segment+0x201f
>>>> 16384 |@@@@@@ 29594 tcp_input+0xe1c
>>>> 32768 |@@@@ 23447 ip_input+0x15f
>>>> 65536 |@@@ 16197
>>>> 131072 |@ 8674
>>>> 262144 | 3358
>>>> 524288 | 456
>>>> 1048576 | 9
>>>> -------------------------------------------------------------------------------
>>>> Count indv cuml rcnt nsec Lock Caller
>>>> 49180 26% 100% 0.00 15929 tcpinp tcp_tw_2msl_scan+0xc6
>>>>
>>>> nsec ------ Time Distribution ------ count Stack
>>>> 4096 | 157 pfslowtimo+0x54
>>>> 8192 |@@@@@@@@@@@@@@@ 24796 softclock_call_cc+0x179
>>>> 16384 |@@@@@@ 11223 softclock+0x44
>>>> 32768 |@@@@ 7426 intr_event_execute_handlers+0x95
>>>> 65536 |@@ 3918
>>>> 131072 | 1363
>>>> 262144 | 278
>>>> 524288 | 19
>>>> -------------------------------------------------------------------------------
>>>
>>> This is interesting, it seems that you have two call paths competing
>>> for INP locks here:
>>>
>>> - pfslowtimo()/tcp_tw_2msl_scan(reuse=0) and
>>>
>>> - tcp_input()/tcp_twstart()/tcp_tw_2msl_scan(reuse=1)
>>
>> My current hypothesis:
>>
>> nginx do write() (or may be close()?) to socket, kernel lock
>> first inp in V_twq_2msl, happen callout for pfslowtimo() on the same
>> CPU core and tcp_tw_2msl_scan infinity locked on same inp.
>>
>> In this case you modification can't help, before next try we need some
>> like yeld().
>
> Or may be locks leaks.
> Or both.
Actually one extra debug thing you can do is launching lockstat with
below extra options:
-H For Hold lock stats
-P To get the overall time
-s 20 To get the stackstrace
To see who is holding the INP lock for so long. Thanks to Hiren for
pointing the -H option.
--
Julien
More information about the freebsd-stable
mailing list