__tls_get_addr problem with recent current

Tue Sep 2 19:39:38 UTC 2008

On Sep 2, 2008, at 2:44 PM, Kostik Belousov wrote:

> On Tue, Sep 02, 2008 at 02:41:08PM -0400, Adam Jacob Muller wrote:
>>
>> On Sep 1, 2008, at 10:53 AM, Kostik Belousov wrote:
>>
>>> On Mon, Sep 01, 2008 at 05:33:37PM +0300, Vyacheslav Bocharov wrote:
>>>> I have similar problem in 7-STABLE (from 1 sep):
>>>> 32bit application exec 64application and we have an core dump:
>>>>
>>>> # gdb fw.sh fw.sh.core
>>>> GNU gdb 6.1.1 [FreeBSD]
>>>> Copyright 2004 Free Software Foundation, Inc.
>>>> GDB is free software, covered by the GNU General Public License,
>>>> and you are
>>>> welcome to change it and/or distribute copies of it under certain
>>>> conditions.
>>>> Type "show copying" to see the conditions.
>>>> There is absolutely no warranty for GDB.  Type "show warranty" for
>>>> details.
>>>> This GDB was configured as "amd64-marcel-freebsd"...
>>>> Core was generated by `fw.sh'.
>>>> Program terminated with signal 11, Segmentation fault.
>>>> Reading symbols from /usr/lib/libstdc++.so.6...done.
>>>> Loaded symbols for /usr/lib/libstdc++.so.6
>>>> Reading symbols from /lib/libm.so.5...done.
>>>> Loaded symbols for /lib/libm.so.5
>>>> Reading symbols from /lib/libgcc_s.so.1...done.
>>>> Loaded symbols for /lib/libgcc_s.so.1
>>>> Reading symbols from /lib/libc.so.7...done.
>>>> Loaded symbols for /lib/libc.so.7
>>>> Reading symbols from /libexec/ld-elf.so.1...done.
>>>> Loaded symbols for /libexec/ld-elf.so.1
>>>> #0  0x0000000800507483 in __tls_get_addr () from /libexec/ld- 
>>>> elf.so.1
>>>> (gdb) bt
>>>> #0  0x0000000800507483 in __tls_get_addr () from /libexec/ld- 
>>>> elf.so.1
>>>> #1  0x0000000800ad8892 in _pthread_mutex_init_calloc_cb () from
>>>> /lib/libc.so.7
>>>> #2  0x0000000800ada35f in malloc () from /lib/libc.so.7
>>>> #3  0x00000008007050ad in operator new () from /usr/lib/libstdc+
>>>> +.so.6
>>>> #4  0x00000008006b5f21 in std::string::_Rep::_S_create ()
>>>> from /usr/lib/libstdc++.so.6
>>>> #5  0x00000008006b6ca5 in std::string::_S_copy_chars ()
>>>> from /usr/lib/libstdc++.so.6
>>>> #6  0x00000008006b6dc2 in std::basic_string<char,
>>>> std::char_traits<char>,
>>>> std::allocator<char> >::basic_string () from /usr/lib/libstdc+ 
>>>> +.so.6
>>>> #7  0x00000000004021ec in  
>>>> __static_initialization_and_destruction_0 (
>>>>  __initialize_p=1, __priority=65535) at CCmdLine.cpp:16
>>>> #8  0x00000000004026c3 in global constructors keyed to cmdlist ()
>>>>  at CCmdLine.cpp:177
>>>> #9  0x00000000004033a2 in __do_global_ctors_aux ()
>>>> #10 0x000000000040113e in _init ()
>>>> #11 0x0000000800b2b0c0 in __cxa_atexit () from /lib/libc.so.7
>>>> #12 0x00000000004014e8 in _start ()
>>>> #13 0x000000080052c000 in ?? ()
>>>>
>>>> I tried your patch but nothing changed.
>>> Exactly which patch ? There were three, one of which caused  
>>> immediate
>>> panic. I put the patches at
>>> http://people.freebsd.org/~kib/misc/fsbase.1.patch
>>> http://people.freebsd.org/~kib/misc/fsbase.2.patch
>>>
>>> Could you, please, try both and report the results ?
>>> And, isolated test case, as several C files or recipe to reproduce
>>> this with base system, would be ideal.
>>>
>>>>
>>>> 2008/8/31 Kostik Belousov <kostikbel at gmail.com>
>>>>
>>>>> On Sun, Aug 31, 2008 at 10:16:18AM +0300, Kostik Belousov wrote:
>>>>>> On Sat, Aug 30, 2008 at 02:03:00PM -0700, Artem Belevich wrote:
>>>>>>> With the new patch kernel has crashed as soon as I ran i386 app,
>>>>>>> though the crash happened within in-kernel thread g_up:
>>>>>>>
>>>>>>> Fatal trap 12: page fault while in kernel mode
>>>>>>> cpuid = 2; apic id = 02
>>>>>>> fault virtual address   = 0x20
>>>>>>> fault code              = supervisor read data, page not present
>>>>>>> instruction pointer     = 0x8:0xffffffff804a821f
>>>>>>> stack pointer           = 0x10:0xffffffffac280b60
>>>>>>> frame pointer           = 0x10:0x0
>>>>>>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>>>>>>                     = DPL 0, pres 1, long 1, def32 0, gran 1
>>>>>>> processor eflags        = resume, IOPL = 0
>>>>>>> current process         = 3 (g_up)
>>>>>>> trap number             = 12
>>>>>>> panic: page fault
>>>>>>> cpuid = 2
>>>>>>> Uptime: 37s
>>>>>>> Physical memory: 8169 MB
>>>>>>> Dumping 380 MB: 365 349 333 317 301 285 269 253 237 221 205 189
>>>>>>> 173
>>>>>>> 157 141 125 109 93 77 61 45 29 13
>>>>>> Could you, please, show me the disassembled code around the  
>>>>>> faulted
>>>>>> %rip ?
>>>>>
>>>>> No need, it seems I found the problem. I trashed the %rdx that
>>>>> contains
>>>>> the third cpu_switch argument. Please, try the updated patch.
>>>>>
>>>>> Thanks for the testing !
>>>>>
>>>>> diff --git a/sys/amd64/amd64/cpu_switch.S b/sys/amd64/amd64/
>>>>> cpu_switch.S
>>>>> index f34b0cc..03f0eca 100644
>>>>> --- a/sys/amd64/amd64/cpu_switch.S
>>>>> +++ b/sys/amd64/amd64/cpu_switch.S
>>>>> @@ -249,6 +249,12 @@ store_seg:
>>>>> 1:     movl    %ds,PCB_DS(%r8)
>>>>>     movl    %es,PCB_ES(%r8)
>>>>>     movl    %fs,PCB_FS(%r8)
>>>>> +       movq    %rdx,%r11
>>>>> +       movl    $MSR_FSBASE,%ecx
>>>>> +       rdmsr
>>>>> +       shlq    $32,%rdx
>>>>> +       leaq    (%rax,%rdx),%r9
>>>>> +       movq    %r11,%rdx
>>>>>      jmp     done_store_seg
>>>>> 2:     movq    PCB_GS32P(%r8),%rax
>>>>>     movq    (%rax),%rax
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Vyacheslav Bocharov
>>
>>
>>
>> Hi,
>> i have this same issue on recent RELENG_7 (pre and post 7.1-
>> PRERELEASE), the issue was reproducible by a simple c-app compiled on
>> 7.x 32-bit
>>
>> #include <unistd.h>
>> main()
>> {
>>   execl("/bin/ls", "/bin/ls", (char *) 0);
>> }
>>
>> this app will segfault rather reliably (but not 100% of the time)
>> (while true;do ./test; if [ "$?" -gt "0" ];then break; fi; done).
>>
>> patch 1 (http://people.freebsd.org/~kib/misc/fsbase.1.patch) fixes  
>> the
>> issue for me
>> patch 2 (http://people.freebsd.org/~kib/misc/fsbase.2.patch) does not
>> though it may mitigate it slightly (cause things to crash less
>> frequently)
>
> Patch below was committed to current, it shall address your issue.
>
> diff --git a/sys/amd64/amd64/cpu_switch.S b/sys/amd64/amd64/ 
> cpu_switch.S
> index f34b0cc..a0b11f8 100644
> --- a/sys/amd64/amd64/cpu_switch.S
> +++ b/sys/amd64/amd64/cpu_switch.S
> @@ -109,8 +109,24 @@ ENTRY(cpu_switch)
> 	movq	%rsp,PCB_RSP(%r8)
> 	movq	%rbx,PCB_RBX(%r8)
> 	movq	%rax,PCB_RIP(%r8)
> -	movq	PCB_FSBASE(%r8),%r9
> -	movq	PCB_GSBASE(%r8),%r10
> +
> +	/*
> +	 * Reread fs and gs bases. Explicit fs segment register load
> +	 * by the usermode code may change actual fs base without
> +	 * updating pcb_{fs,gs}base.
> +	 *
> +	 * %rdx still contains the mtx, save %rdx around rdmsr.
> +	 */
> +	movq	%rdx,%r11
> +	movl	$MSR_FSBASE,%ecx
> +	rdmsr
> +	shlq	$32,%rdx
> +	leaq	(%rax,%rdx),%r9
> +	movl	$MSR_KGSBASE,%ecx
> +	rdmsr
> +	shlq	$32,%rdx
> +	leaq	(%rax,%rdx),%r10
> +	movq	%r11,%rdx
>
> 	testl	$PCB_32BIT,PCB_FLAGS(%r8)
> 	jnz	store_seg
> diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
> index 06c0803..f3c41f7 100644
> --- a/sys/amd64/amd64/machdep.c
> +++ b/sys/amd64/amd64/machdep.c
> @@ -734,6 +734,7 @@ exec_setregs(td, entry, stack, ps_strings)
> 	pcb->pcb_fsbase = 0;
> 	pcb->pcb_gsbase = 0;
> 	critical_exit();
> +	pcb->pcb_flags &= ~(PCB_32BIT | PCB_GS32BIT);
> 	load_ds(_udatasel);
> 	load_es(_udatasel);
> 	load_fs(_udatasel);
> diff --git a/sys/amd64/ia32/ia32_signal.c b/sys/amd64/ia32/ 
> ia32_signal.c
> index 9e98656..162dcf9 100644
> --- a/sys/amd64/ia32/ia32_signal.c
> +++ b/sys/amd64/ia32/ia32_signal.c
> @@ -742,5 +742,6 @@ ia32_setregs(td, entry, stack, ps_strings)
>
> 	/* Return via doreti so that we can change to a different %cs */
> 	pcb->pcb_flags |= PCB_FULLCTX | PCB_32BIT;
> +	pcb->pcb_flags &= ~PCB_GS32BIT;
> 	td->td_retval[1] = 0;
> }

This will be MFC'd into 7.1 before release?

-Adam