svn commit: r274489 - in head/sys/amd64: amd64 include

Bruce Evans brde at optusnet.com.au
Sat Nov 22 07:18:03 UTC 2014



On Thu, 20 Nov 2014, Rui Paulo wrote:

> On Nov 13, 2014, at 14:11, Scott Long <scottl at FreeBSD.org> wrote:
>>
>> Author: scottl
>> Date: Thu Nov 13 22:11:44 2014
>> New Revision: 274489
>> URL: https://svnweb.freebsd.org/changeset/base/274489
>>
>> Log:
>>  Extend earlier addition of stack frames to most of support.S.  This makes
>>  stack traces in KDB, HWPMC, and DTrace much more reliable and useful.
>
> No performance differences?  The kernel enables/disables the compiler option to omit the frame pointer based on the kernel config file.  If DDB, DTrace, or HWPMC is enabled, the frame pointer is always saved in C functions.

That bug is only implemented for amd64 and powerpc:
- it is in Makefile.amd64.  It is hard-coded under the above options, and
   thus breaks any settings of -fno-omit-frame-pointer -fno-omit-leaf-frame-
   pointer in the user's options, depending on undocumented ordering of
   the options.  It also breaks profiling.
- it is in Makefile.powerpc unless DDB is configured.
- it is not in Makefile.i386.  files.i386 and files.pc98 take the necessary
   care to not blow away -fno-omit-frame-pointer in the user's options for
   atomic.c; however, all functions in atomic.c are leaf functions, so this
   may be broken now.  The null documentation in cc.1 doesn't say.
- it is in kmod.mk for some amd64 and powerpc.  There it breaks modules
   unconditionally.

The breakage for profiling is quite serious, since the frame pointer might
be dereferenced unconditionally.  However, amd64 and i386 still use my
optimization of avoiding the dereference unless profiling is enabled as
well as configured.  Asm code in them uses my related optimization of
not using a frame pointer at all for functions written in asm (ENTRY()
hides the details, and the details are arranged so as not to

The breakage is maximal for profiling of modules.  You could have a
kernel compile for profiling or just DDB, DTrace, or HWPMC, but modules
not compiled for these.  Only broken modules can depend on kernel
options, and kmod.mk doesn't check the options anyway.  The default
is fail-unsafe for amd64 and powerpc.  It gives broken modules that
can never match the kernel profiling, DDB, DTrace or HWPMC options
unless these are hacked into individual module Makefiles.  This gives
crashes soon if any module is used by a kernel with profiling configured
and enabled.  DDB can make invalid dereferences of the frame pointer,
but these are trapped harmlessly (except someone broke the trap handler,
so it now does a stack trace of ddb internals; this spams the console
and risks a recursive trap).  I don't know if DTrace and HWPMC also
trap the dereferences.  Profiling certainly doesn't.

Kernel stack traces without DDB, DTrace or HWPMC on amd64 or DDB on
powerpc seem to be broken, even without modules.

> Some of these functions are in the hot path, so if you didn't see any performance problem, I wonder if we should disable -fomit-frame-pointer always.

The performance problem is about 0.0001% of the time spent in the kernel
(which is hopefully a small fraction of the time spent in userland) on
modern OOE pipelined systems, since the frame pointer switch can run ini
parallel on these systems and not many functions are well enough
scheduled to not have spare resources for this.  Especially on i386
where args are passed on the stack -- lots can run in parallel with
just loading the args, and the only problems are the extra code size
and extra memory accesses for switching the frame pointer.

I made up the 0.0001% number.  The number is tiny anyway, since there
aren't many asm functions so it would take an unusual workload to spend
even 1% of the time in these functions except possibly if they are
copyin/out of large data.  Then any extra 1-10 cycles in each function
might be 1% of this.  Functions like fubyte() are an exception -- even
1 extra cycle in them might have a measurable effect if they were called
a lot.  However, fubyte() isn't called a lot, and if it were then then
a frame pointer is the least of its pessimizations.

My old optimizations to avoid frame pointers for profiling had a small
effect for i486's since i486's are in-order and only have 1 pipeline.
Even then, the effect was insignificant when profiling was enabled
since the main profiling routine took a long time and needs a frame
pointer anyway.

In a quick test of a microbenchmark in userland,
-fomit-frame-frame-pointer -fomit-leaf-frame-pointer was 1 cycle slower
(30 -> 31) for one function but 1 cycle faster (26 -> 25) for another
function.  The benchmark is known to execute about 2 copies of the
function in parallel, so the extra instructions cost nothing if there
is a spare slot for them to run in every 20-30 cycles.

Compilers understand little of this.  I think using a frame pointer is
sometimes faster on x86 because instructions to access stack variables
are 1 byte longer when not using a frame pointer and this sometimes
cost.  OTOH, it might be best to set up a frame pointer but not
actually use it explicitly (it would only be used by DDB etc.), so
that the frame pointer accesses have no dependencies except each other.
Modern x86 hardware already does a lot of virtualization with special
cases for the frame pointer to reduce dependencies, but it shouldn't
hurt to reduce them explicitly.  This happens automatically in the
recent amd64 changes -- the frame pointer isn't used explicityly
before or after.

Bugs in this change include:
- it isn't done for all arches.  It would be harder on i386 since the
   args are on the stack and all stack offsets would change.
- the details aren't hidden in the ENTRY() macro.  Putting it there
   would make the necessary stack offsets a little harder to apply.
   The correct register to use for the stack offsets would also be
   a problem.  Hard-coding use of the frame pointer would make the
   offsets easier to get right and not depend on options, except it
   would make using a frame pointer non-optional.  Always using it
   wouldn't be too bad for leaf functions in asm, but there would
   still be complications for non-C functions.  Profiling avoids
   some of these complications basically by setting up a frame
   pointer for the profiling call but undoing that before ENTRY()
   returns.  This also allows the change to not interfere with
   profiling.
- it doesn't track the -fomit-*-frame-pointer option in CFLAGS.
   Compilers are bad about putting their options in predefines.  For
   profiling, the kernel options GPROF and GUPROF are used.

Bruce


More information about the svn-src-all mailing list