[Bug 255840] __get_locale() is inefficient
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Thu May 13 13:25:20 UTC 2021
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255840
Bug ID: 255840
Summary: __get_locale() is inefficient
Product: Base System
Version: CURRENT
Hardware: amd64
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: bin
Assignee: bugs at FreeBSD.org
Reporter: markj at FreeBSD.org
In libc we have:
195 /**
196 * Returns the current locale for this thread, or the global locale if none
is
197 * set. The caller does not have to free the locale. The return value
from
198 * this call is not guaranteed to remain valid after the locale changes.
As
199 * such, this should only be called within libc functions.
200 */
201 static inline locale_t __get_locale(void)
202 {
203
204 if (!__has_thread_locale) {
205 return (&__xlocale_global_locale);
206 }
207 return (__thread_locale ? __thread_locale :
&__xlocale_global_locale);
208 }
Here, __has_thread_locale and __xlocale_globale_locale are global variables.
In the common case, !__has_thread_locale is true. __thread_locale is a
thread-local variable.
This function is called any time MB_CUR_MAX is loaded, which may happen
frequently (see PR 255551 for example).
On main, __get_locale() compiles to this:
0x000000080115e300 <+0>: push %rbp
0x000000080115e301 <+1>: mov %rsp,%rbp
0x000000080115e304 <+4>: push %rbx
0x000000080115e305 <+5>: push %rax
0x000000080115e306 <+6>: mov 0x113fbb(%rip),%rbx # 0x8012722c8
0x000000080115e30d <+13>: data16 lea 0x113fa3(%rip),%rdi #
0x8012722b8
0x000000080115e315 <+21>: data16 data16 rex.W call 0x8012654b0
<__tls_get_addr at plt>
0x000000080115e31d <+29>: mov (%rax),%rax
0x000000080115e320 <+32>: test %rax,%rax
0x000000080115e323 <+35>: mov 0x113e6e(%rip),%rcx # 0x801272198
0x000000080115e32a <+42>: cmove %rcx,%rax
0x000000080115e32e <+46>: cmpl $0x0,(%rbx)
0x000000080115e331 <+49>: cmove %rcx,%rax
0x000000080115e335 <+53>: mov 0x18(%rax),%rax
0x000000080115e339 <+57>: mov 0x70(%rax),%eax
0x000000080115e33c <+60>: add $0x8,%rsp
0x000000080115e340 <+64>: pop %rbx
0x000000080115e341 <+65>: pop %rbp
0x000000080115e342 <+66>: ret
In particular, the address of __thread_locale is obtained even if it isn't
going to be used because no threads have set a per-thread locale using
uselocale(3). But to obtain this address we have to call into rtld, and the
call has a significant cost: a program which performs the comparison MB_CUR_MAX
== 1 500,000,000 times runs in about 2.7s on my workstation. With libc
modified to split the test of __thread_locale into a separate function, the
runtime is reduced to 1.0s.
I'm not quite sure why clang compiles __get_locale() this way. I presume it's
to avoid branches, but it's quite suboptimal.
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list