cvs commit: src/sys/kern init_main.c kern_malloc.c md5c.c subr_autoconf.c subr_mbuf.c subr_prf.c tty_subr.c vfs_cluster.c vfs_subr.c

David Schultz das at FreeBSD.ORG
Tue Jul 22 16:17:04 PDT 2003

On Tue, Jul 22, 2003, Poul-Henning Kamp wrote:
> Compiling the file with and without the inline, and forcing GCC
> to respect the inline finds:
> 	   text    data     bss     dec     hex filename
> inlined:  17408      76     420   17904    45f0 vm_object.o
> regular:  14944      76     420   15440    3c50 vm_object.o
>           -----
> 	   2464
> At least I find that 2k+ code is a non-trivial amount which is
> likely, through prefetch and cache flushing, to have a negative
> performance impact.

There's a reason I used this example earlier also.  ;-)
vm_object_backing_scan() is basically a template function that
gets instantiated into several variants that have a little bit of
code in common.  For a given VM object, control will pass to only
one instance of the inline function, so you likely won't get
better temporal locality by un-inlining it.  Moreover, the
un-inlined version is actually much larger than any of the inlined
instances due to (lack of) constant propagation / dead code
elimination, so it could potentially blow out your ICACHE all by
itself.  But I think the main performance hit here is actually
going to be all the branches that can no longer be optimized out
of the inner loop that iterates over all the pages in the object.

> Given two conflicting arguments, we need to resort to reality for
> the answer, and what we need to reinstate the inline on an informed
> basis is a realistic benchmark which indicates a positive performance
> impact of the (respected) inline request.

We're in agreement here, except that I believe the burden of proof
should be on the person going around and removing others' inlines.
I think that in this case, inlining will help by a very modest
amount, but I could be wrong.  Without proof to the contrary, I
think it's safest in general to assume that the original author
knew what he was doing (but see below).

> There are several issues here Alan, and let us try to keep them
> separated to the extent possible, the two main issues are:
> 1. Should GCC _always_ respect when we put __inline in the the source
>    code ?

It's really ugly to have to say __always_inline in some places and
__inline in others.  I think the programmer should get what he
asks for when he says __inline, because the vast majority of
FreeBSD committers are smarter than gcc.
[Insert cynical reply here.]

gcc's feature of refusing to inline large functions isn't even
designed to reduce code size.  It's a kludge to prevent gcc from
falling off a big-O cliff when attempting to compile huge
functions---perhaps a response to some inane compiler benchmark.

> 2. When should we put __inline in the source code ?

This is the more important question to me.  Undoubtedly __inline
has been abused, but I'd rather see a real person fix them than
the compiler.  It shouldn't be too difficult to compile the kernel
with and without -fno-inline and compare the object file sizes.

There will be a few cases where it is obviously right to inline,
and a few cases where it is obviously wrong.  In most of the
remaining cases, such as vm_object_backing_scan(), the difference
between inlining or not will be far enough into the noise that
only careful benchmarking can provide a good answer.

Actually, it might be interesting to make a list of all the
functions in the kernel that contain inline calls sorted by the
bytes of bloat.  Then for all those grey areas, developers could
be asked to look at the list and reconsider their use of inlines,
and you wouldn't have to waste your talent trying to evaluate each one.

More information about the cvs-src mailing list