svn commit: r333461 - head/sys/amd64/amd64

Bruce Evans brde at optusnet.com.au
Fri May 11 01:38:33 UTC 2018


On Thu, 10 May 2018, Konstantin Belousov wrote:

> On Fri, May 11, 2018 at 03:31:46AM +1000, Bruce Evans wrote:
>> On Thu, 10 May 2018, Konstantin Belousov wrote:
>>
>>> Log:
>>>  Make fpusave() and fpurestore() on amd64 ifuncs.
>>>
>>>  From now on, linking amd64 kernel requires either lld or newer ld.bfd.
>>
>> This breaks building with gcc:
>>
>> XX cc1: warnings being treated as errors
>> XX ../../../amd64/amd64/fpu.c:195: warning: 'ifunc' attribute directive ignored [-Wattributes]
>> ...
>> XX ./machine/fpu.h:62: warning: previous declaration of 'fpurestore' was here
> Yes.  Nothing unexpected.

Thus it is not suitable for commit.

>> After building fpu.o with clang, linking doesn't reqire a newer ld, but
>> booting does -- after linking with an older ld, the boot panics with a
>> page fault near fork().
> Yes.  I noted this in the commit message.
>
> emaste will commit the check shortly which would prevent using inadequate
> linker for building kernel.

Now it is broken even for clang:

XX make: "../../../conf/../../../conf/kern.pre.mk" line 125: amd64 kernel requires linker ifunc support

This uses a LINKER_FEATURES, but LINKER_FEATURES is not defined anywhere in
the sys tree.  It is defined in the src amd host tree in bsd.linker.mk.
Kernel makefiles have been broken by adding a lot of dependencies on files
not in the sys tree, but bsd.linker.mk doesn't seem to be included.

Even "make -V LINKER_FEATURES" to determine what LINKER_FEATURES is is broken
(it gives the above error).

After backing out r333470, "make -V LINKER_FEATURES" works and gives
" build-id filter retpoline" with both clang and gcc.  LINKER_FEATURES
seems to be hard-coded somewhere.

Only the ld that will be used by the build knows its features.  The
ld is hard to find.  My gcc is a shell script with lots of -B paths that
eventually find ld.  Links should be done by ${CC} to find this ld.  But
the kernel uses ${LD}.  When this became incompatible earlier this year
after working for about 15 years despite its logical incompatibility,
I worked around the problem by adding "makeoptions LD=<path to my ld>".
My ld was pre-lld until yesterday.  For this test, it is a symlink to
the host's ld.  This supports ifuncs, but LINKER_FEATURES says that
it doesn't.

The host (freefall) also doesn't support ifuncs according to
"make -V LINKER_FEATURES" in src/bin/cat.

Only bsd.pre.mk was broken in r333470.  kern.mk doesn't have this check.

It is a layering bug to not put such checks in kern.mk.

>> ...
>> This looks like a small or null pessimization.  The branch in the old
>> version is probably faster.  The following simple test on Haswell shows
>> that all reasonable versions have almost the same speed when cached:
> Yes, I already noted and mjg noted that ifuncs are directed through PLT.
> I remember that it was not the case when I did it the first time, but then
> both compiler and linker were different.

The test failed to explain why the branch-free version worked slower only
when written in asm (7.25 cycles instead of 7).  It was because I changed
the return value from (xsave ? 1 : 0) to (xsave ? 2 : 1) to better match
some asm versions.  gcc -O only produces branch-free code for '1 : 0', and
that is what I first tested for the C version (before that, I had forgotten
that brranch-free cide might be better).  gcc -O2 produces branch-free code
for '2 : 1'.  It generates the same code as for '1 : 0', then increments this.
Neither -O nor -O2 uses cmov for some reason.  The increment is 1 more
instruction, but doesn't increase the time of 7 cycles.  This is all with
gcc-3.3.3.  gcc-4.2.1 -O is "smarter" and generates "sbb %eax,%eax; add $2,
%eax" to generate '2 : 1'.  This also desn't change the time of 7 cycles.

> I tried a quick experiment with adding -fvisibility=hidden to the kernel
> compilation, but it still leaves ifunc relocations on PLT (and PLT correctly
> consists only of ifuncs).
>
> I might look some more on it later.  Anyway, there are more uses, and SMAP
> plus pmap applications of ifunc are more clear.  I selected amd64/fpu.c
> for the initial commit to check toolchain in wild since this case is
> simplest and easy to diagnose if failing, I hope.

You need to find about 16 million uses per second at 4 GHz for this to give
a 1% optimization ("this" = 0.25 cycles).  -O2 and other optimizations give
an improvement in this range, but break debugging so I turn them off.  Except
for clang, the -finline* flags for turning off some optimizations don't
even work.  A typical stack trace for clang usually has
<value optimized out> for almost all values even for debuggers that should
be able to find the values in registers.  ifuncs might also give
<function name optimized out>.  Debugging is already very broken by inlining
small functions -- stack traces don't show the small functions or their
args, but who a caller several layers high.  All for optimizations in the
1-5% range for kernels, or a small fraction of that for total time.

Bruce


More information about the svn-src-head mailing list