svn commit: r227812 - head/lib/libc/string

Thu Nov 24 05:17:49 UTC 2011

On Tue, 22 Nov 2011, Eitan Adler wrote:

> Meta comment:
> I should have sent this patch to -hackers prior to seeking approval to
> commit. I learned my lesson and will seek wider review before making
> such changes in the future.
> On Tue, Nov 22, 2011 at 6:21 AM, Bruce Evans <brde at optusnet.com.au> wrote:
> I saw your email about the style changes. I'd like to flesh out what
> changes should be made first before making another commit>> I guess
> I'm a little confused. Â Do we really have profiling>> information at
> this level that suggests the overhead of the branch is>> significant?
> I thought most hardware had pretty good>> branch-prediction,
> particularly with speculative execution.
> I made this change at the direction of theraven at . I assume he knows a
> little bit more about compilers than I do. ;)It seems from the rest of
> this thread that even if this were an optimization it still shouldn't
> be done at source code code level.
>> Every time I have profiled the string functions, changes like this always> have an insignificant effect, even in benchmarks. Â Normal code spends> perhaps 0.01% of its time calling string functions, so the improvements> from optimizing string functions are 0.01% of an insignificant amount.

Maybe my mail client (unmentionable) is messing up something.  This came
back with the quotes even more mangled than usual, especially for my
reply.

> The problem with profiling this type of change is that it is hard to
> find a good representative benchmark. I could easily write code that
> will show you that adding the equality check is a good idea or that it
> is a horrible idea. IMHO it saves enough time when they are equal, but
> loses almost no time when the strings are not equal.
>>> Wouldn't something like __predict_false() have more value for>> performance,
> It seems that at most this couldn't hurt.
> On Tue, Nov 22, 2011 at 10:33 AM, David Schultz <das at freebsd.org> wrote:
>> Third, it's not clear that checking whether s1 == s2 is even an
>> optimization. Â Most programs simply aren't going to pass identical
>> pointers as arguments to strcmp(), so for the overwhelming
>> majority of cases, this is just a wasted test and a wasted slot in
>> a branch predict table. Â (FWIW, I doubt that a realistic benchmark
>> would demonstrate any measurable difference.)
>
> It is fairly common for programs to compare a value passed to a
> function to a array of strings. Also note that for the strn*cmp
> functions we already have a branch so this isn't always a wasted slot.

But das's mail never has these problems, even when you quote it.

> Here is what I'd like to do next:
>
> - fix bde@'s style nits
> - change the | to a || and remove the comment
> - but leave the equality check as is.
> - find a src committer to approve the patch
> - go back to working on ports for a while ;)
>
> Is this the right course of action? Or should I just revert both
> commits entirely?

I would revert.

Another point that I forgot to mention in a previous reply is that many
of the MI implementations of string functions aren't actually used on
primary arches, so optimizing them has no effect instead of 0.01% of
an insignificant effect.  This applies to at least strcmp() in the
recent changes:
- amd64 has a sophisticated MD strcmp.  It's sophistication didn't
   include checking for equal pointers, and the recent changes didn't
   touch it.  "sophisticated" may mean over-engineered.
- i386 has an "optimized" MD strcmp.  It uses 8-way unrolling.  This
   may have been good when it was written for the original i386 or i486,
   but it is now dubious or worse.  Its optimizations didn't include
   checking for equal pointers, and the recent changes didn't touch it.
This doesn't apply to most of the other changed functions.  The efficiency
of the others was considered so unimportant that no one ever tried to
optimize them for amd64; i386 has MD "optimized" versions for many more
string functions than does i386, perhaps including a couple of the changed
ones.

One of the changed functions is strcasecmp().  This can't possibly be
signifcantly optimized at the MD level, since it does locale stuff.
The bloat in the locale stuff in it was expanded recently, so the
constant factor for the O(N) in it expanded and the pointer-equality
optimization does more.  But if the optimality of strcasecmp() mattered,
then it would have been thinner and its expansion would have generated
more mail.

Bruce