svn commit: r316064 - head/sys/boot/i386/boot2

Tue Mar 28 05:53:55 UTC 2017

On Mon, 27 Mar 2017, Warner Losh wrote:

> Log:
>  Fix build with path names with 'align' or 'nop' in them.
>
>  clang is now inserting .file directives with the entire path in
>  them. This is fine, except that our sed peephole optimizer removes
>  them if ${SRCTOP} or ${OBJTOP} contains 'align' or 'nop', leading to
>  build failures. The sed peephole optimizer removes useful things for
>  boot2 when used with clang, so restrict its use to gcc. Also, gcc no
>  longer generates nops to pad things, so there's no point in removing
>  it. Specialize the optimization to just removing the .align 4 lines to
>  preclude inadvertant path matching.
>
>  Sponsored by: Netflix
>  Commit brought to you the path: /home/xxx/NCD-3592-logsynopts/FreeBSD
>
> Modified:
>  head/sys/boot/i386/boot2/Makefile
>
> Modified: head/sys/boot/i386/boot2/Makefile
> ==============================================================================
> --- head/sys/boot/i386/boot2/Makefile	Mon Mar 27 22:34:43 2017	(r316063)
> +++ head/sys/boot/i386/boot2/Makefile	Mon Mar 27 22:53:36 2017	(r316064)
> @@ -91,10 +91,18 @@ boot2.o: boot2.s
>
> SRCS=	boot2.c boot2.h
>
> +# Gcc (4.2.1 at least) benefits from removing the forced alignment
> +# clang doesn't. Make the removal as specific as possible to avoid
> +# false positives (like path names with odd names for debugging info).
> +# In the past, gcc benefited from nop removal, but not in 4.2.1.
> +# Think of this as a poor-man's peephole optimizer for gcc 4.2.1
> boot2.s: boot2.c boot2.h ${.CURDIR}/../../common/ufsread.c
> 	${CC} ${CFLAGS} -S -o boot2.s.tmp ${.CURDIR}/boot2.c
> -	sed -e '/align/d' -e '/nop/d' < boot2.s.tmp > boot2.s
> -	rm -f boot2.s.tmp
> +.if ${COMPILER_TYPE} == "gcc"
> +	sed -e '/\.align 4/d' < boot2.s.tmp > boot2.s
> +.else
> +	cp boot2.s.tmp boot2.s
> +.endif
>
> boot2.h: boot1.out
> 	${NM} -t d ${.ALLSRC} | awk '/([0-9])+ T xread/ \

That's silly; if you just fix the regexps to actually be as specifice as
possible, then they would just work with no need for large comments or
ifdefs.  '\.align' instead of 'align' is a good start.  ' 4' instead of
nothing is not so good.  There should also be a check for nothing
except whitespace before '\.align'.  The whitespace before '4' might
be a tab, but perhaps gcc only generates a space, and only generates
unwanted alignments of precisely 4, with no fill bytes.  The general
syntax would be too hard to parse, but it is easy to handle any digit,
and also '\.p2align', and allow nothing except whitespace after the
digit.

Last time I looked at this about 5 years ago, but after clang, the
whole optimization step was silly.  It made little difference for gcc
and no difference for clang as you say.  The nop removal that you
removed was silliest.  I think compilers stopped generating nops for
padding many years before that code was written.  In text sections,
They should generate something like '.p2align x,y,z', where x is the
alignment, y is empty (it is the fill byte, which defaults to nop, but
we don't want that but want a larger null instruction for multiple
bytes) and z is a limit on the amount of padding (x is only a
preference and z limits it).  This syntax would be hard to parse.
But CFLAGS has -Os, and that works for preventing padding in the
text section, so nop is never generated, not even with the spelling
'align'.

-Os doesn't work so well for other things.  For kernels, it is completely
broken with gcc-4.2.1 unless -fno-inline-functions is also used (it
breaks the inlining parameters whose misconfiguration is probably the
actual bug.  This doesn't seem to be a problem for boot code.  IIRC,
the only simply broken thing is that alignment is still done for variables.
This generates ".align 4".

-Os is differently broken for clang, so the simple editing doesn't help.
-Wa,-al to see what the assembler is doing is broken for clang.
-mpreferred-stack-boundary doesn't work for clang, so is not used, but
clang does better stack alignment by default.  This might still waste
some instructions, but not for every function.  -mno-align-long-strings
doesn't work for clang or for gcc > 4.2.1, so is not used.  clang does
quite different string padding by default, and IIRC it is worse.  But
boot2 hardly has any long strings.

boot code is hard to test since it is not modular.  This is much more broken
than when I last tested the size of boot2.o.  The most obvious new bugs are:
- something like SRCTOP breaks finding the relative path to ufs headers in
   a FreeBSD-11 sys tree
- ${CLANG_OPT_SMALL} is added for all compilers.  This is defined out-of-tree
   in /usr/share/mk/bsd.sys.mk.  (I tested on a host running nearly -current
   with a nearly FreeBSD-10 sys tree).  In boot2, CLANG_P{T_SMALL is
   wrong even for clang since it contains -mstack-alignment=8
   unconditionally, but the stack alignment should be 4 or even 1 with
   -m32 -Os.  Fixing this makes no difference for boot2.o compiled by
   clang.  The corresponding -mpreferred-stack-boundary=2 for gcc is
   configured correctly in boot2/Makefile.

After fixing this, simply "CC=gcc-4.2.1 make boot2.o".  Adding -Wa,-al
to CC also works (-Wa,-al is broken for clang, and clang produces a boot2.s
which cannot be assembled by gas).

The file sizes for FreeBSD-10 are:

   text   data    bss     dec      hex   filename
   5126     12   4940   10078   0x275e   clang/boot2-unhacked.o
   5126      9   4940   10075   0x275b   clang/boot2.o
   5036     32   4928    9996   0x270c   gcc/boot-unhacked.o
   5032     29   4928    9989   0x2705   gcc/boot2.o

In other words, the editing saves a whole 4 bytes of text and 3 bytes of
data for gcc.  It also saves a whole 3 bytes of data for clang, and you
just broke this :-).  (I would be surprised if the data isn't padded back
to a multiple of 4 anyway.)

After the reductions, the result with gcc is 86 bytes smaller than with
clang, instead of only 82 bytes smaller.

It is silly doing such small optimizations specially.

The editing actually removes 22 alignment statements, all spelled
"^<tab>\.align<space>4$'.  All of these are for data.  -Wa,-al shows
that the padding is 2 instances of 2 bytes and 1 of 3 bytes.  At
least half of the padding is for pessimal layout which is also a
style bug in the source code.  Arrays of characters are mixed with
32-bit variables.  Style(9) says to sort on size (it means on alignment/
element size), to minimize padding (mainly for structs) and get a conistent
order as a side effect.  The padding usually doesn't matter except for
structs.

After fixing the source code, there would be just 1 odd byte in a
character array that normally causes 3 bytes of padding after it.  It
takes considerably more magic to avoid getting this padding anyway in
the link step,  In the old a.out boot blocks, the linker forces 16-byte
alignment at the end of every object file.  The old boot blocks put
most of the data in a single object file to avoid an aerage of 8 bytes
of padding between files.  Elf is more flexible, and usually gives
4-byte alignment.  I think it is possible to match an odd byte at the
end of 1 object file with an odd 3 bytes at the beginning of the next
object file, but this usually doesn't happen.  The new boot2 uses
hacks like including ufsread.c to avoid having lots of object files.

I use the old a.out boot blocks updated to elf and EDD and have squeezed
them more extensively.  The squeezing is also silly since I could very
easily switch to 64K ones (much larger than that is not easy, since
they have to fit below 640K and a few multiples of 64K are already
used for buffers).  The limit on 8K is mainly a historical mistake.
A limit of 7.5K simplified booting from 15-sector floppies.  18-sector
floppies allowed easy expansion to 9K, but were unportable for a small
gain.  The default partition layout left only 8K below the ffs
superblock, but was only a default.

Bruce