cvs commit: src/sys/kern init_main.c kern_malloc.c md5c.c subr_autoconf.c subr_mbuf.c subr_prf.c tty_subr.c vfs_cluster.c vfs_subr.c

Tue Jul 22 23:03:25 PDT 2003

In message <3F1E1EDC.5124C137 at imimic.com>, "Alan L. Cox" writes:
>Poul-Henning Kamp wrote:
>> 
>> In message <20030722233258.6913E2A7EA at canning.wemm.org>, Peter Wemm writes:
>> >"Poul-Henning Kamp" wrote:
>> >
>> >>    If Y < X, then you have by definition a performance gain.
>> >
>> >Only if you look at the classic model where you ignore things like
>> >speculation and assume that every instruction is executed exactly once etc.
>> >Mainframe optimization strategy is not necessarily applicable to to
>> >contemporary cpus.
>> ...
>> >I suspect Alan Cox already knows the answer to 'which is faster' in
>> >the vm_object_backing_scan() case and he's waiting for you to put your foot
>> >in it. :-)
>> 
>
>Ok, here goes: I've actually measured it.  :-)
>
>Specifically, I tested the "always inline" directive on the i386's
>pmap_changebit().  This function is used by several pmap functions,
>including pmap_clear_reference().  I did not, however, spend enough time
>to construct a test that I considered conclusive.

Ok, and what was the result of your mesurement ?
Did it help or not ?

>Now, let's look at code size.  On one hand, inlining pmap_changebit()
>increases the overall size of the kernel.  On the other hand, it reduces
>the size of the code that implements pmap_clear_reference() from ~400
>bytes to ~350 bytes, mostly through the elimination of conditional
>branches.  My experience is that the latter notion of code size is more
>likely to correlate with real performance than the former.

Compile without inline, check object size:
	  15485     160     548   16193    3f41 pmap.o
compile with inline, check object size:
	  16029     160     548   16737    4161 pmap.o

The uninlined function is 368 bytes, so three copies of the full
function would be 1104 bytes, consequently we have saved about
"half of the function" by inlining it.

Such a reasoning would satisfy me that people have actually done
some amount of work to test their inlining, and given the
tightloop use in this case, satisfy as a probable performance
improvement.

But obviously, when GCC is able to ignore the inlining and we don't
notice, then people have not done any such reasoning...

As I said many times already, I am not adverse to people adding
inline when they have done their homework, I am only against
the purely speculative inlining.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.