svn commit: r367813 - head/lib/libutil

Stefan Esser se at freebsd.org
Wed Nov 18 23:25:35 UTC 2020


Am 18.11.20 um 23:39 schrieb Jessica Clarke:
> On 18 Nov 2020, at 22:32, Stefan Esser <se at freebsd.org> wrote:
>>
>> Am 18.11.20 um 22:40 schrieb Mateusz Guzik:
>>>> +{
>>>> +	static const int localbase_oid[2] = {CTL_USER, USER_LOCALBASE};
>>> There is no use for this to be static.
>>
>> Why not? This makes it part of the constant initialized data of
>> the library, which is more efficient under run-time and memory
>> space aspects than initializing them on the stack.
>>
>> What do I miss?
> 
> What is more efficient is not so clear-cut, it depends on things like
> the architecture, microarchitecture and ABI. Allocating a small buffer
> on the stack is extremely cheap (the memory will almost certainly be in
> the L1 cache), whereas globally-allocated storage is less likely to be
> in the cache due to being spread out, and on some architecture/ABI
> combinations will incur additional indirection through the GOT. Also, 8
> bytes of additional stack use is lost in the noise.

The memory latency of the extra access to the constant will be hidden in 
the noise. The data will probably be in a page that has already been
accessed (so no TLB load penalty) and modern CPUs will be able to deal
with the cache miss (if any, because the cache line may already be
loaded depending on other data near-by).

Yes, I do agree that a stack local variable could have been used and
it could have been created with little effort by a modern multi-issue
CPU. The code size would have been larger, though, by some 10 to 20
bytes, I'd assume - but I doubt a single path through this code is
measurable, much less observable in practice.

We are talking about nano-seconds here (even if the cache line did
not contain the constant data, it would probably be accessed just a
few instructions further down and incur the same latency then).

I have followed CPU development over more than 45 years and know many
architectures and their specifics, but the time were I have programmed
drivers in assembly and counted instruction cycles is long gone.

This is nitpicking at a level that I do not want to continue. I'm not
opposed to achieving efficiency where it is relevant. This function is
providing useful functionality and I do not mind a wasted microsecond,
it is not relevant here. (This was different if it was wasted within
a tight loop - but it is not, it is typically called once if at all).

Feel free to replace my code with your implementation, if you think it
is not acceptable as written by me.

I just wanted to provide an implementation of this functionality to
be used in a number of programs where other developers had expressed
interest in such a feature (and one of these programs has been worked
on by me in recent weeks, so I'm now able to make use of it myself).

Regards, STefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/svn-src-head/attachments/20201119/70348a2d/attachment.sig>


More information about the svn-src-head mailing list