svn commit: r333461 - head/sys/amd64/amd64
Bruce Evans
brde at optusnet.com.au
Fri May 11 01:38:33 UTC 2018
On Thu, 10 May 2018, Konstantin Belousov wrote:
> On Fri, May 11, 2018 at 03:31:46AM +1000, Bruce Evans wrote:
>> On Thu, 10 May 2018, Konstantin Belousov wrote:
>>
>>> Log:
>>> Make fpusave() and fpurestore() on amd64 ifuncs.
>>>
>>> From now on, linking amd64 kernel requires either lld or newer ld.bfd.
>>
>> This breaks building with gcc:
>>
>> XX cc1: warnings being treated as errors
>> XX ../../../amd64/amd64/fpu.c:195: warning: 'ifunc' attribute directive ignored [-Wattributes]
>> ...
>> XX ./machine/fpu.h:62: warning: previous declaration of 'fpurestore' was here
> Yes. Nothing unexpected.
Thus it is not suitable for commit.
>> After building fpu.o with clang, linking doesn't reqire a newer ld, but
>> booting does -- after linking with an older ld, the boot panics with a
>> page fault near fork().
> Yes. I noted this in the commit message.
>
> emaste will commit the check shortly which would prevent using inadequate
> linker for building kernel.
Now it is broken even for clang:
XX make: "../../../conf/../../../conf/kern.pre.mk" line 125: amd64 kernel requires linker ifunc support
This uses a LINKER_FEATURES, but LINKER_FEATURES is not defined anywhere in
the sys tree. It is defined in the src amd host tree in bsd.linker.mk.
Kernel makefiles have been broken by adding a lot of dependencies on files
not in the sys tree, but bsd.linker.mk doesn't seem to be included.
Even "make -V LINKER_FEATURES" to determine what LINKER_FEATURES is is broken
(it gives the above error).
After backing out r333470, "make -V LINKER_FEATURES" works and gives
" build-id filter retpoline" with both clang and gcc. LINKER_FEATURES
seems to be hard-coded somewhere.
Only the ld that will be used by the build knows its features. The
ld is hard to find. My gcc is a shell script with lots of -B paths that
eventually find ld. Links should be done by ${CC} to find this ld. But
the kernel uses ${LD}. When this became incompatible earlier this year
after working for about 15 years despite its logical incompatibility,
I worked around the problem by adding "makeoptions LD=<path to my ld>".
My ld was pre-lld until yesterday. For this test, it is a symlink to
the host's ld. This supports ifuncs, but LINKER_FEATURES says that
it doesn't.
The host (freefall) also doesn't support ifuncs according to
"make -V LINKER_FEATURES" in src/bin/cat.
Only bsd.pre.mk was broken in r333470. kern.mk doesn't have this check.
It is a layering bug to not put such checks in kern.mk.
>> ...
>> This looks like a small or null pessimization. The branch in the old
>> version is probably faster. The following simple test on Haswell shows
>> that all reasonable versions have almost the same speed when cached:
> Yes, I already noted and mjg noted that ifuncs are directed through PLT.
> I remember that it was not the case when I did it the first time, but then
> both compiler and linker were different.
The test failed to explain why the branch-free version worked slower only
when written in asm (7.25 cycles instead of 7). It was because I changed
the return value from (xsave ? 1 : 0) to (xsave ? 2 : 1) to better match
some asm versions. gcc -O only produces branch-free code for '1 : 0', and
that is what I first tested for the C version (before that, I had forgotten
that brranch-free cide might be better). gcc -O2 produces branch-free code
for '2 : 1'. It generates the same code as for '1 : 0', then increments this.
Neither -O nor -O2 uses cmov for some reason. The increment is 1 more
instruction, but doesn't increase the time of 7 cycles. This is all with
gcc-3.3.3. gcc-4.2.1 -O is "smarter" and generates "sbb %eax,%eax; add $2,
%eax" to generate '2 : 1'. This also desn't change the time of 7 cycles.
> I tried a quick experiment with adding -fvisibility=hidden to the kernel
> compilation, but it still leaves ifunc relocations on PLT (and PLT correctly
> consists only of ifuncs).
>
> I might look some more on it later. Anyway, there are more uses, and SMAP
> plus pmap applications of ifunc are more clear. I selected amd64/fpu.c
> for the initial commit to check toolchain in wild since this case is
> simplest and easy to diagnose if failing, I hope.
You need to find about 16 million uses per second at 4 GHz for this to give
a 1% optimization ("this" = 0.25 cycles). -O2 and other optimizations give
an improvement in this range, but break debugging so I turn them off. Except
for clang, the -finline* flags for turning off some optimizations don't
even work. A typical stack trace for clang usually has
<value optimized out> for almost all values even for debuggers that should
be able to find the values in registers. ifuncs might also give
<function name optimized out>. Debugging is already very broken by inlining
small functions -- stack traces don't show the small functions or their
args, but who a caller several layers high. All for optimizations in the
1-5% range for kernels, or a small fraction of that for total time.
Bruce
More information about the svn-src-head
mailing list