svn commit: r247319 - in projects/calloutng/sys: kern sys

Alexander Motin mav at FreeBSD.org
Tue Feb 26 15:57:54 UTC 2013


On 26.02.2013 17:49, Attilio Rao wrote:
> On Tue, Feb 26, 2013 at 4:46 PM, Alexander Motin <mav at freebsd.org> wrote:
>> On 26.02.2013 17:28, Attilio Rao wrote:
>>> On Tue, Feb 26, 2013 at 4:25 PM, Alexander Motin <mav at freebsd.org> wrote:
>>>> Author: mav
>>>> Date: Tue Feb 26 15:25:43 2013
>>>> New Revision: 247319
>>>> URL: http://svnweb.freebsd.org/changeset/base/247319
>>>>
>>>> Log:
>>>>   Optimize callout_process() to use less variables and less conditions to
>>>>   implement the same logic.  Now it fits better into CPU registers, and
>>>>   according to PMC significntly reduces number of resource stalls, reducing
>>>>   consumed by it CPU time during usleep(1) benchmark by 30%.
>>>
>>> Is that all improved i-cache capacity and improved dynamic branch
>>> prediction (hwpmc has counters for both FWIW)?
>>
>> I-cache capacity I think is not significant there as the loop is quite
>> small. I believe it was branch misprediction, complicated by additional
>> latency of memory accesses. I haven't analyzed cause deeper, as PMC man
>> pages are not the most informative and easiest reading.
> 
> Well, I-cache is really very small, so I think you may get some
> improvement also for the function you were trying to optimize.
> You can get all the counter description by doing: pmccontrol -L
> From there you may find some hwpmc counter showing i-cache and dynamic
> branch prediction misses statistics.

I've noticed that even without any branching changes removal of one
variable, allowing compiler to reuse the register (checked in assembler
sources), gave measurable result. I think it would not happen if the
cause was on instruction fetching side. But sure, I'll continue
experiments with HWPMC.

-- 
Alexander Motin


More information about the svn-src-projects mailing list