svn commit: r303583 - head/sys/amd64/amd64

Konstantin Belousov kostikbel at gmail.com
Sun Jul 31 16:35:33 UTC 2016


On Sun, Jul 31, 2016 at 11:11:25PM +1000, Bruce Evans wrote:
> On Haswell, "rep stos" takes about 25 cycles to start up, and the function
> call overhead is in the noise.  25 cycles is a lot.  Haswell can move
> 32 bytes/cycle from L2 to L2, so it misses moving 800 bytes or 1/5 of a
> page in its startup overhead.  Oops, that is for "rep movs".  "rep stos"
> is similar.
> 
The commit message contained a probable explanation of the reason why
the change demonstrated measurable improvement in non-microbenchmark load.

That said, the only thing I am answering and asking there is the above
claim about 25 cycles overhead of rep;stosq on hsw. I am curious how
the overhead was measured. Note: Agner Fog' tables state that fast mode
takes <2n uops and has reciprocal throughput of 0.5n worst case and do
not demostrate any setup overhead for hsw.


More information about the svn-src-head mailing list