svn commit: r294697 - head/sys/net80211

Bruce Evans brde at optusnet.com.au
Mon Jan 25 07:15:30 UTC 2016


On Sun, 24 Jan 2016, Andriy Voskoboinyk wrote:

> Log:
>  net80211: reduce stack usage for ieee80211_ioctl*() methods.
>
>  Use malloc(9) for
>   - struct ieee80211req_wpaie2 (518 bytes, used in
>  ieee80211_ioctl_getwpaie())
>   - struct ieee80211_scan_req (128 bytes, used in setmlme_assoc_adhoc()
>  and ieee80211_ioctl_scanreq())
>
>  Also, drop __noinline workarounds; stack overflow is not reproducible
>  with recent compilers.
>
>  Tested with Clang 3.7.1, GCC 4.2.1 (from 9.3-RELEASE) and 4.9.4
>  (with -fstack-usage flag)

Inlining also breaks debugging.  It is best avoided generally using gcc
-fnon-inline-functions-called-once.  This flag is broken (not supported)
in clang.

> Modified: head/sys/net80211/ieee80211_ioctl.c
> ==============================================================================
> --- head/sys/net80211/ieee80211_ioctl.c	Sun Jan 24 23:28:14 2016	(r294696)
> +++ head/sys/net80211/ieee80211_ioctl.c	Sun Jan 24 23:35:20 2016	(r294697)
> -/*
> - * When building the kernel with -O2 on the i386 architecture, gcc
> - * seems to want to inline this function into ieee80211_ioctl()
> - * (which is the only routine that calls it). When this happens,
> - * ieee80211_ioctl() ends up consuming an additional 2K of stack
> - * space. (Exactly why it needs so much is unclear.) The problem
> - * is that it's possible for ieee80211_ioctl() to invoke other
> - * routines (including driver init functions) which could then find
> - * themselves perilously close to exhausting the stack.
> - *
> - * To avoid this, we deliberately prevent gcc from inlining this
> - * routine. Another way to avoid this is to use less agressive
> - * optimization when compiling this file (i.e. -O instead of -O2)
> - * but special-casing the compilation of this one module in the
> - * build system would be awkward.
> - */

Even with -O1 -mtune=i386 -fno-inline-functions-called-once, gcc-4.2.1
still breaks debugging of static functions by using a different calling
convention for them.  The first couple of args are passed in registers.
This breaks ddb stack traces on i386 not quite as badly as they have
always been broken on amd64.  (ddb cannot determine the number of args
or where they are on amd64, and used to print 5 words of stack garbage.
On i386, the args list is still printed and is almost as confusing as
garbage, since it is correct for extern functions but for static
functions it starts at about the third arg).

I use __attribute__((__regparm(0))) to unbreak the ABI for a few
functions designed to be called from within ddb as well as the main
code.  Some older functions like inb_() with this desgn still work
accidentally because they are extern.  I haven't figured out the
command-line flag to fix this yet.  Maybe just -mregparm.  I didn't
try hard to fix this since I was working on optimizations more than
debugging when I added the attribute.

Inlining really should reduce stack usage and thus be an optimization
that is actually useful for kernels.

Compilers are clueless about optimizations that are useful for kernels.
-Os should help, but is very broken in gcc-4.2.1 (it fails to compile
some files due to hitting inlining limits, and after working around
this, gives a negative optimization for space of about 30%).  -Os
works OK for clang -- it reduces the space a little and the time by
almost as much as -O2.  But optimizations like clang -O2 -march=native
are less than 10% faster than pessimizations like gcc-old -O1 -mtune=i386
-fno-inline-functions-called-once -fno-unit-at-a-time in kernels, in
micro-benchmarks that are favourable to the optimizations.  More like
1% for normal use.  (-fno-unit-at-a-time should reduce opportunities
for inlining static functions if -fno-inline-functions-called-once
doesn't work, but is also broken (not supported) in clang.)

Optimizations larger than 1% can possibly be obtained by using compiler
builtins, but compiler builtins are turned off by -ffreestanding.  I
no longer bother to turn some like __builtin_memcpy() back on.

Bruce


More information about the svn-src-head mailing list