svn commit: r334419 - head/sys/amd64/amd64

Mateusz Guzik mjguzik at gmail.com
Thu May 31 15:22:28 UTC 2018


On Thu, May 31, 2018 at 09:19:58PM +1000, Bruce Evans wrote:
> On Thu, 31 May 2018, Mateusz Guzik wrote:
>
> > Log:
> >  amd64: switch pagecopy from non-temporal stores to rep movsq
>
> As for pagezero, this pessimizes for machines with slow movsq and/or
caches
> (mostly older machines).
>

Can you give examples of such machines? I tested with old yellers like
Nehalem and Westmere, no loss.

> >  The copied data is accessed in part soon after and it results with
additional
> >  cache misses during a -j 1 buildkernel WITHOUT_CTF=yes KERNFAST=1, as
measured
> >  with pmc stat.
>
> Of course it causes more cache misses later, but for large data going
through
> slow caches is much slower so the cache misses later cost less.
>

The note was predominantly for people who would want to defend nt stores
claiming it prevents evicting cached data by data being copied and then
mostly not accessed.

As for speed diff, see above.

> It is negatively useful to write this in asm.  This is now just memcpy()
> and the asm version of that is fast enough, though movsq takes too long
> to start up.  This memcpy() might be inlined and then it would be
> insignificantly faster than the function call.  __builtin_memcpy() won't
> actually inline it, since its size is large and compilers know that they
> don't understand memory.
>

It is true that currently it can be the current memcpy with almost no loss.

However, even on a kernel with #define memcpy __builtin_memcpy, there
are plenty of calls with very small sizes. See the list here (taken
during buildkernel):

https://people.freebsd.org/~mjg/bufsizes.txt

In particular you can find a lot of < 64 entries.

Spinning up rep stosb for such sizes even with ERMS turns out to be
pessimal even on Skylake. In other words, the primitive will need to get
special casing for small-sized callers. Known big-size callers should be
moved to something else. As such, pointing pagecopy at the primitive is
imo a bad idea.

As was noted elsewhere the current ifunc support has an undesirable
property of generating indirect calls. Whatever happens next (this gets
fixed or abandoned perhaps), there will be a way to select appropriate
routines at boot time.

If you know of specific amd64 microarchs which benefit from nt stores in
either pagzero or pagecopy, we can just special case them later.

In the meantime, I find the current change to be in the right direction.

-- 
Mateusz Guzik <mjguzik gmail.com>


More information about the svn-src-head mailing list