svn commit: r315522 - in head: contrib/binutils/ld/emulparams sys/conf

Sun Mar 19 02:25:35 UTC 2017

On Sun, 19 Mar 2017, Ed Maste wrote:

> Log:
>  use INT3 instead of NOP for x86 binary padding
>
>  We should never end up executing the inter-function padding, so we
>  are better off faulting than silently carrying on to whatever function
>  happens to be next.
>
>  Note that LLD will soon do this by default (although it currently pads
>  with zeros).
>
>  Reviewed by:	dim, kib
>  MFC after:	1 month
>  Sponsored by:	The FreeBSD Foundation
>  Differential Revision:	https://reviews.freebsd.org/D10047

Is this a pessimization?  Instruction prefetch near the end of almost
every function now fetches INT3 instead of NOP.  Both have to be
decoded to decoded whether to speculatively execute them.  INT3 is
unlikely to be speculatively executed, but it takes extra work to
decide not to do so.

Functions normally end with a RET or unconditional JMP, and then branch
prediction usually prevents speculative execution beyond the end, so the
pessimization must be small.

Intra-function padding that is executed now uses "fat NOP" instructions
like null LEA's since this is faster to execute than a long string of
NOPs.  This is less readable than NOPs or even INT3's.  Of course, INT3
can't be used for executed padding.  I think it is also used for intra-
function padding that is not executed.  This is just harder to read
unless it is needed to avoid the possible pessimization in this commit.
The intra-function code with nops might look like:

 		jmp	over
 		nop
 		# 7 nops altogether
 		nop
 	over:

or

 		jmp	over
 		nullpad7	# single 7 byte null padding instruction
 	over:

and it is likely to be CPU-dependent whether 7 possibly-speculatively
executed nops take more or less resources than 1 possibly-speculatively
executed fancy instruction.   I would expect the fancy instructions to
take more resources each.

Fancy LEAs don't seem such a good choice for executed padding either.
amd64 uses lots of REX prefixes instead of fancy instructions, since
these are designed to have low overheads.  They certainly aren't
executed separately.  On i386, the same technique with lots of older
prefixes is not used much, probably because all prefixes have high
overheads on old i386's.  They can be as slow as NOPs although they
aren't executed separately.

Bruce