svn commit: r315522 - in head: contrib/binutils/ld/emulparams sys/conf
Rodney W. Grimes
freebsd at pdx.rh.CN85.dnsmgr.net
Sun Mar 19 15:46:35 UTC 2017
> On Sun, 19 Mar 2017, Ed Maste wrote:
>
> > Log:
> > use INT3 instead of NOP for x86 binary padding
> >
> > We should never end up executing the inter-function padding, so we
> > are better off faulting than silently carrying on to whatever function
> > happens to be next.
> >
> > Note that LLD will soon do this by default (although it currently pads
> > with zeros).
> >
> > Reviewed by: dim, kib
> > MFC after: 1 month
> > Sponsored by: The FreeBSD Foundation
> > Differential Revision: https://reviews.freebsd.org/D10047
>
> Is this a pessimization? Instruction prefetch near the end of almost
> every function now fetches INT3 instead of NOP. Both have to be
> decoded to decoded whether to speculatively execute them. INT3 is
> unlikely to be speculatively executed, but it takes extra work to
> decide not to do so.
>
> Functions normally end with a RET or unconditional JMP, and then branch
> prediction usually prevents speculative execution beyond the end, so the
> pessimization must be small.
>
> Intra-function padding that is executed now uses "fat NOP" instructions
> like null LEA's since this is faster to execute than a long string of
> NOPs. This is less readable than NOPs or even INT3's. Of course, INT3
> can't be used for executed padding. I think it is also used for intra-
> function padding that is not executed. This is just harder to read
> unless it is needed to avoid the possible pessimization in this commit.
> The intra-function code with nops might look like:
>
> jmp over
> nop
> # 7 nops altogether
> nop
> over:
>
> or
>
> jmp over
> nullpad7 # single 7 byte null padding instruction
> over:
>
> and it is likely to be CPU-dependent whether 7 possibly-speculatively
> executed nops take more or less resources than 1 possibly-speculatively
> executed fancy instruction. I would expect the fancy instructions to
> take more resources each.
>
> Fancy LEAs don't seem such a good choice for executed padding either.
> amd64 uses lots of REX prefixes instead of fancy instructions, since
> these are designed to have low overheads. They certainly aren't
> executed separately. On i386, the same technique with lots of older
> prefixes is not used much, probably because all prefixes have high
> overheads on old i386's. They can be as slow as NOPs although they
> aren't executed separately.
As an intermediate ground what about using N of something really
easy for the decoder/branch predictor to grovel over, then a single
int3 at the end of the block so if we do fall into this we end
up getting the desired effect?
nop's followed by an
> Bruce
--
Rod Grimes rgrimes at freebsd.org
More information about the svn-src-head
mailing list