svn commit: r316064 - head/sys/boot/i386/boot2
Warner Losh
imp at bsdimp.com
Tue Mar 28 08:03:26 UTC 2017
[[ sorry for the top post but it's quick ]]
Summary, in Bruce's own words
> the whole optimization step was silly.
I agree. on my -current system, clang compiled boot2 was 4 bytes
smaller after ripping it out. gcc was a whopping 7 bytes larger. Since
gcc has 156 still free, I think it's best to just retire this.
Maybe you can give me a hint as to which structs to look at to get
back those 7 bytes :)
Thanks for the writeup. It was most enlightening.
Warner
On Mon, Mar 27, 2017 at 11:53 PM, Bruce Evans <brde at optusnet.com.au> wrote:
> On Mon, 27 Mar 2017, Warner Losh wrote:
>
>> Log:
>> Fix build with path names with 'align' or 'nop' in them.
>>
>> clang is now inserting .file directives with the entire path in
>> them. This is fine, except that our sed peephole optimizer removes
>> them if ${SRCTOP} or ${OBJTOP} contains 'align' or 'nop', leading to
>> build failures. The sed peephole optimizer removes useful things for
>> boot2 when used with clang, so restrict its use to gcc. Also, gcc no
>> longer generates nops to pad things, so there's no point in removing
>> it. Specialize the optimization to just removing the .align 4 lines to
>> preclude inadvertant path matching.
>>
>> Sponsored by: Netflix
>> Commit brought to you the path: /home/xxx/NCD-3592-logsynopts/FreeBSD
>>
>> Modified:
>> head/sys/boot/i386/boot2/Makefile
>>
>> Modified: head/sys/boot/i386/boot2/Makefile
>>
>> ==============================================================================
>> --- head/sys/boot/i386/boot2/Makefile Mon Mar 27 22:34:43 2017
>> (r316063)
>> +++ head/sys/boot/i386/boot2/Makefile Mon Mar 27 22:53:36 2017
>> (r316064)
>> @@ -91,10 +91,18 @@ boot2.o: boot2.s
>>
>> SRCS= boot2.c boot2.h
>>
>> +# Gcc (4.2.1 at least) benefits from removing the forced alignment
>> +# clang doesn't. Make the removal as specific as possible to avoid
>> +# false positives (like path names with odd names for debugging info).
>> +# In the past, gcc benefited from nop removal, but not in 4.2.1.
>> +# Think of this as a poor-man's peephole optimizer for gcc 4.2.1
>> boot2.s: boot2.c boot2.h ${.CURDIR}/../../common/ufsread.c
>> ${CC} ${CFLAGS} -S -o boot2.s.tmp ${.CURDIR}/boot2.c
>> - sed -e '/align/d' -e '/nop/d' < boot2.s.tmp > boot2.s
>> - rm -f boot2.s.tmp
>> +.if ${COMPILER_TYPE} == "gcc"
>> + sed -e '/\.align 4/d' < boot2.s.tmp > boot2.s
>> +.else
>> + cp boot2.s.tmp boot2.s
>> +.endif
>>
>> boot2.h: boot1.out
>> ${NM} -t d ${.ALLSRC} | awk '/([0-9])+ T xread/ \
>
>
> That's silly; if you just fix the regexps to actually be as specifice as
> possible, then they would just work with no need for large comments or
> ifdefs. '\.align' instead of 'align' is a good start. ' 4' instead of
> nothing is not so good. There should also be a check for nothing
> except whitespace before '\.align'. The whitespace before '4' might
> be a tab, but perhaps gcc only generates a space, and only generates
> unwanted alignments of precisely 4, with no fill bytes. The general
> syntax would be too hard to parse, but it is easy to handle any digit,
> and also '\.p2align', and allow nothing except whitespace after the
> digit.
>
> Last time I looked at this about 5 years ago, but after clang, the
> whole optimization step was silly. It made little difference for gcc
> and no difference for clang as you say. The nop removal that you
> removed was silliest. I think compilers stopped generating nops for
> padding many years before that code was written. In text sections,
> They should generate something like '.p2align x,y,z', where x is the
> alignment, y is empty (it is the fill byte, which defaults to nop, but
> we don't want that but want a larger null instruction for multiple
> bytes) and z is a limit on the amount of padding (x is only a
> preference and z limits it). This syntax would be hard to parse.
> But CFLAGS has -Os, and that works for preventing padding in the
> text section, so nop is never generated, not even with the spelling
> 'align'.
>
> -Os doesn't work so well for other things. For kernels, it is completely
> broken with gcc-4.2.1 unless -fno-inline-functions is also used (it
> breaks the inlining parameters whose misconfiguration is probably the
> actual bug. This doesn't seem to be a problem for boot code. IIRC,
> the only simply broken thing is that alignment is still done for variables.
> This generates ".align 4".
>
> -Os is differently broken for clang, so the simple editing doesn't help.
> -Wa,-al to see what the assembler is doing is broken for clang.
> -mpreferred-stack-boundary doesn't work for clang, so is not used, but
> clang does better stack alignment by default. This might still waste
> some instructions, but not for every function. -mno-align-long-strings
> doesn't work for clang or for gcc > 4.2.1, so is not used. clang does
> quite different string padding by default, and IIRC it is worse. But
> boot2 hardly has any long strings.
>
> boot code is hard to test since it is not modular. This is much more broken
> than when I last tested the size of boot2.o. The most obvious new bugs are:
> - something like SRCTOP breaks finding the relative path to ufs headers in
> a FreeBSD-11 sys tree
> - ${CLANG_OPT_SMALL} is added for all compilers. This is defined
> out-of-tree
> in /usr/share/mk/bsd.sys.mk. (I tested on a host running nearly -current
> with a nearly FreeBSD-10 sys tree). In boot2, CLANG_P{T_SMALL is
> wrong even for clang since it contains -mstack-alignment=8
> unconditionally, but the stack alignment should be 4 or even 1 with
> -m32 -Os. Fixing this makes no difference for boot2.o compiled by
> clang. The corresponding -mpreferred-stack-boundary=2 for gcc is
> configured correctly in boot2/Makefile.
>
> After fixing this, simply "CC=gcc-4.2.1 make boot2.o". Adding -Wa,-al
> to CC also works (-Wa,-al is broken for clang, and clang produces a boot2.s
> which cannot be assembled by gas).
>
> The file sizes for FreeBSD-10 are:
>
> text data bss dec hex filename
> 5126 12 4940 10078 0x275e clang/boot2-unhacked.o
> 5126 9 4940 10075 0x275b clang/boot2.o
> 5036 32 4928 9996 0x270c gcc/boot-unhacked.o
> 5032 29 4928 9989 0x2705 gcc/boot2.o
>
> In other words, the editing saves a whole 4 bytes of text and 3 bytes of
> data for gcc. It also saves a whole 3 bytes of data for clang, and you
> just broke this :-). (I would be surprised if the data isn't padded back
> to a multiple of 4 anyway.)
>
> After the reductions, the result with gcc is 86 bytes smaller than with
> clang, instead of only 82 bytes smaller.
>
> It is silly doing such small optimizations specially.
>
> The editing actually removes 22 alignment statements, all spelled
> "^<tab>\.align<space>4$'. All of these are for data. -Wa,-al shows
> that the padding is 2 instances of 2 bytes and 1 of 3 bytes. At
> least half of the padding is for pessimal layout which is also a
> style bug in the source code. Arrays of characters are mixed with
> 32-bit variables. Style(9) says to sort on size (it means on alignment/
> element size), to minimize padding (mainly for structs) and get a conistent
> order as a side effect. The padding usually doesn't matter except for
> structs.
>
> After fixing the source code, there would be just 1 odd byte in a
> character array that normally causes 3 bytes of padding after it. It
> takes considerably more magic to avoid getting this padding anyway in
> the link step, In the old a.out boot blocks, the linker forces 16-byte
> alignment at the end of every object file. The old boot blocks put
> most of the data in a single object file to avoid an aerage of 8 bytes
> of padding between files. Elf is more flexible, and usually gives
> 4-byte alignment. I think it is possible to match an odd byte at the
> end of 1 object file with an odd 3 bytes at the beginning of the next
> object file, but this usually doesn't happen. The new boot2 uses
> hacks like including ufsread.c to avoid having lots of object files.
>
> I use the old a.out boot blocks updated to elf and EDD and have squeezed
> them more extensively. The squeezing is also silly since I could very
> easily switch to 64K ones (much larger than that is not easy, since
> they have to fit below 640K and a few multiples of 64K are already
> used for buffers). The limit on 8K is mainly a historical mistake.
> A limit of 7.5K simplified booting from 15-sector floppies. 18-sector
> floppies allowed easy expansion to 9K, but were unportable for a small
> gain. The default partition layout left only 8K below the ffs
> superblock, but was only a default.
>
> Bruce
More information about the svn-src-all
mailing list