svn commit: r357805 - head/sys/amd64/include

Mateusz Guzik mjguzik at gmail.com
Wed Feb 12 17:46:17 UTC 2020


On 2/12/20, Gleb Smirnoff <glebius at freebsd.org> wrote:
> On Wed, Feb 12, 2020 at 11:12:14AM +0000, Mateusz Guzik wrote:
> M> Author: mjg
> M> Date: Wed Feb 12 11:12:13 2020
> M> New Revision: 357805
> M> URL: https://svnweb.freebsd.org/changeset/base/357805
> M>
> M> Log:
> M>   amd64: store per-cpu allocations subtracted by __pcpu
> M>
> M>   This eliminates a runtime subtraction from counter_u64_add.
> M>
> M>   before:
> M>   mov    0x4f00ed(%rip),%rax        # 0xffffffff80c01788
> <numfullpathfail4>
> M>   sub    0x808ff6(%rip),%rax        # 0xffffffff80f1a698 <__pcpu>
> M>   addq   $0x1,%gs:(%rax)
> M>
> M>   after:
> M>   mov    0x4f02fd(%rip),%rax        # 0xffffffff80c01788
> <numfullpathfail4>
> M>   addq   $0x1,%gs:(%rax)
> M>
> M>   Reviewed by:	jeff
> M>   Differential Revision:	https://reviews.freebsd.org/D23570
>
> Neat optimization! Thanks. Why didn't we do it back when created counter?
>

Don't look at me, I did not work on it.

You can top it for counters like the above -- most actual counters are
known to be there at compilatin time and they never disappear. Meaning
that in the simplest case they can just be a part of one big array in
struct pcpu. Then assembly could resort to addq $0x1,%gs:(someoffset)
removing the mov loading the address -- faster single threaded and less
cache use.

I'm confident I noted this at least few times.

-- 
Mateusz Guzik <mjguzik gmail.com>


More information about the svn-src-all mailing list