[rfc] removing -mpreferred-stack-boundary=2 flag for i386?
Alexander Best
arundel at freebsd.org
Sat Dec 24 09:37:53 UTC 2011
On Sat Dec 24 11, Bruce Evans wrote:
> On Fri, 23 Dec 2011, Alexander Best wrote:
>
> >is -mpreferred-stack-boundary=2 really necessary for i386 builds any
> >longer?
> >i built GENERIC (including modules) with and without that flag. the results
> >are:
>
> The same as it has always been. It avoids some bloat.
>
> >1654496 bytes with the flag set
> >vs.
> >1654952 bytes with the flag unset
>
> I don't believe this. GENERIC is enormously bloated, so it has size
> more like 16MB than 1.6MB. Even a savings of 4K instead of 456 bytes
> is hard to believe. I get a savings of 9K (text) in a 5MB kernel.
> Changing the default target arch from i386 to pentium-undocumented has
> reduced the text space savings a little, since the default for passing
> args is now to preallocate stack space for them and store to this,
> instead of to push them; this preallocation results in more functions
> needing to allocate some stack space explicitly, and when some is
> allocated explicitly, the text space cost for this doesn't depend on
> the size of the allocation.
>
> Anyway, the savings are mostly from from avoiding cache misses from
> sparse allocation on stacks.
>
> Also, FreeBSD-i386 hasn't been programmed to support aligned stacks:
> - KSTACK_PAGES on i386 is 2, while on amd64 it is 4. Using more
> stack might push something over the edge
> - not much care is taken to align the initial stack or to keep the
> stack aligned in calls from asm code. E.g., any alignment for
> mi_startup() (and thus proc0?) is accidental. This may result
> in perfect alignment or perfect misalignment. Hopefully, more
> care is taken with thread startup. For gcc, the alignment is
> done bogusly in main() in userland, but there is no main() in
> the kernel. The alignment doesn't matter much (provided the
> perfect misalignment is still to a multiple of 4), but when it
> matters, the random misalignment that results from not trying to
> do it at all is better than perfect misalignment from getting it
> wrong. With 4-byte alignment, the only cases that it helps are
> with 64-bit variables.
>
> >the gcc(1) man page states the following:
> >
> >"
> >This extra alignment does consume extra stack space, and generally
> >increases code size. Code that is sensitive to stack space usage,
> >such as embedded systems and operating system kernels, may want to
> >reduce the preferred alignment to -mpreferred-stack-boundary=2.
> >"
> >
> >the comment in sys/conf/kern.mk however sorta suggests that the default
> >alignment of 4 bytes might improve performance.
>
> The default stack alignment is 16 bytes, which unimproves performance.
maybe the part of the comment in sys/conf/kern.mk, which mentions that a stack
alignment of 16 bytes might improve micro benchmark results should be removed.
this would prevent people (like me) from thinking, using a stack alignment of
4 bytes is a compromise between size and efficiently. it isn't! currently a
stack alignment of 16 bytes has no advantages towards one with 4 bytes on i386.
so specifying -mpreferred-stack-boundary=2 on i386 is absolutely mandatory.
please see the attached patch, which also introduduces a line break in order to
describe the stack alignment issue in a paragraph of its own.
cheers.
alex
>
> clang handles stack alignment correctly (only does it when it is needed)
> so it doesn't need a -mpreferred-stack-boundary option and doesn't
> always break without alignment in main(). Well, at least it used to,
> IIRC. Testing it now shows that it does the necessary andl of the
> stack pointer for __aligned(32), but for __aligned(16) it now assumes
> that the stack is aligned by the caller. So it now needs
> -mpreferred-stack-boundary=2, but doesn't have it. OTOH, clang doesn't
> do the andl in main() like gcc does (unless you put a dummy __aligned(32)
> there), but requires crt to pass an aligned stack.
>
> Bruce
-------------- next part --------------
Index: /usr/src/sys/conf/kern.mk
===================================================================
--- /usr/src/sys/conf/kern.mk (revision 228845)
+++ /usr/src/sys/conf/kern.mk (working copy)
@@ -30,12 +30,12 @@
# On i386, do not align the stack to 16-byte boundaries. Otherwise GCC 2.95
# and above adds code to the entry and exit point of every function to align the
# stack to 16-byte boundaries -- thus wasting approximately 12 bytes of stack
-# per function call. While the 16-byte alignment may benefit micro benchmarks,
-# it is probably an overall loss as it makes the code bigger (less efficient
-# use of code cache tag lines) and uses more stack (less efficient use of data
-# cache tag lines). Explicitly prohibit the use of FPU, SSE and other SIMD
-# operations inside the kernel itself. These operations are exclusively
-# reserved for user applications.
+# per function call. This makes the code bigger (less efficient use of code
+# cache tag lines) and uses more stack (less efficient use of data cache tag
+# lines).
+# Explicitly prohibit the use of FPU, SSE and other SIMD operations inside the
+# kernel itself. These operations are exclusively reserved for user
+# applications.
#
# gcc:
# Setting -mno-mmx implies -mno-3dnow
More information about the freebsd-current
mailing list