svn commit: r367813 - head/lib/libutil

Thu Nov 19 00:37:58 UTC 2020

On 11/19/20, Stefan Esser <se at freebsd.org> wrote:
> Am 18.11.20 um 23:39 schrieb Jessica Clarke:
>> On 18 Nov 2020, at 22:32, Stefan Esser <se at freebsd.org> wrote:
>>>
>>> Am 18.11.20 um 22:40 schrieb Mateusz Guzik:
>>>>> +{
>>>>> +	static const int localbase_oid[2] = {CTL_USER, USER_LOCALBASE};
>>>> There is no use for this to be static.
>>>
>>> Why not? This makes it part of the constant initialized data of
>>> the library, which is more efficient under run-time and memory
>>> space aspects than initializing them on the stack.
>>>
>>> What do I miss?
>>
>> What is more efficient is not so clear-cut, it depends on things like
>> the architecture, microarchitecture and ABI. Allocating a small buffer
>> on the stack is extremely cheap (the memory will almost certainly be in
>> the L1 cache), whereas globally-allocated storage is less likely to be
>> in the cache due to being spread out, and on some architecture/ABI
>> combinations will incur additional indirection through the GOT. Also, 8
>> bytes of additional stack use is lost in the noise.
>
> The memory latency of the extra access to the constant will be hidden in
> the noise. The data will probably be in a page that has already been
> accessed (so no TLB load penalty) and modern CPUs will be able to deal
> with the cache miss (if any, because the cache line may already be
> loaded depending on other data near-by).
>
> Yes, I do agree that a stack local variable could have been used and
> it could have been created with little effort by a modern multi-issue
> CPU. The code size would have been larger, though, by some 10 to 20
> bytes, I'd assume - but I doubt a single path through this code is
> measurable, much less observable in practice.
>
> We are talking about nano-seconds here (even if the cache line did
> not contain the constant data, it would probably be accessed just a
> few instructions further down and incur the same latency then).
>
> I have followed CPU development over more than 45 years and know many
> architectures and their specifics, but the time were I have programmed
> drivers in assembly and counted instruction cycles is long gone.
>
> This is nitpicking at a level that I do not want to continue. I'm not
> opposed to achieving efficiency where it is relevant. This function is
> providing useful functionality and I do not mind a wasted microsecond,
> it is not relevant here. (This was different if it was wasted within
> a tight loop - but it is not, it is typically called once if at all).
>
> Feel free to replace my code with your implementation, if you think it
> is not acceptable as written by me.
>
> I just wanted to provide an implementation of this functionality to
> be used in a number of programs where other developers had expressed
> interest in such a feature (and one of these programs has been worked
> on by me in recent weeks, so I'm now able to make use of it myself).
>

The entire localbase saga is getting way out of hand.

To address this e-mail and things you wrote in another reply, my
comlaints are very simple and are not getting less valid for not being
raised sooner. I just was not watching any of this until recent
fallout.

For the change at hand, there has to be a reason to use a static
symbol. Standard example is catching otherwise expensive to obtain
data. For that the static localbase pointer makes perfect sense, while
the static lookup array does not have any justification that I see.
Bringing cache, TLB or whatever microarchitectural details into the
discussion is beyond not warranted.

More importantly though the commit comes with a self-confessed memory
leak and is a show stopper.

That said, I'll see about patching this up.

-- 
Mateusz Guzik <mjguzik gmail.com>