svn commit: r304436 - in head: . sys/netinet

Adrian Chadd adrian at freebsd.org
Fri Aug 26 21:32:02 UTC 2016


Hi,

It's pcb lock contention.



-adrian


On 26 August 2016 at 08:13, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote:
> On Fri, Aug 26, 2016 at 04:01:14PM +0100, Bruce Simpson wrote:
>
>> Slawa,
>>
>> I'm afraid this may be a bit of a non-sequitur. Sorry.. I seem to be
>> missing something. As I understand it this thread is about Ryan's change
>> to netinet for broadcast.
>>
>> On 26/08/16 15:49, Slawa Olhovchenkov wrote:
>> > On Sun, Aug 21, 2016 at 03:04:00AM +0300, Slawa Olhovchenkov wrote:
>> >> On Sun, Aug 21, 2016 at 12:25:46AM +0100, Bruce Simpson wrote:
>> >>> Whilst I agree with your concerns about multipoint, I support the
>> >>> motivation behind Ryan's original change: optimize the common case.
>> >>
>> >> Oh, common case...
>> >> I am have pmc profiling for TCP output and see on this SVG picture and
>> >> don't find any simple way.
>> >> You want to watch too?
>> >
>> > At time peak network traffic (more then 25K connections, about 20Gbit
>> > total traffic) half of cores fully utilised by network stack.
>> >
>> > This is flamegraph from one core: http://zxy.spb.ru/cpu10.svg
>> > This is same, but stack cut of at ixgbe_rxeof for more unified
>> > tcp/ip stack view http://zxy.spb.ru/cpu10u.svg
>> ...
>>
>> I appreciate that you've taken the time to post a flamegraph (a
>> fashionable visualization) of relative performance in the FreeBSD
>> networking stack.
>>
>> Sadly, I am mostly out of my depth for looking at stack wide performance
>> for the moment; for the things I look at involving FreeBSD at work just
>> at the moment, I would not generally go down there except for specific
>> performance issues (e.g. with IEEE 1588).
>>
>> It sounds as though perhaps you should raise a wider discussion about
>> your results on -net. I would caution you however that the Function
>> Boundary Trace (FBT) provider for DTrace can introduce a fair amount of
>> noise to the raw performance data because of the trap mechanism it uses.
>> This ruled it out for one of my own studies requiring packet-level accuracy.
>>
>> Whilst raw pmc(4) profiles may require more post-processing, they will
>> provide less equivocal data (and a better fix) on the hot path, due also
>> to being sampled effectively on a PMC interrupt (a gather stage- poll
>> core+uncore MSRs), not purely a software timer interrupt.
>
> Thanks for answer, I am now try to start discussion on -net.


More information about the svn-src-all mailing list