Re: armv7-on-aarch64 stuck at urdlck: I got a replication of the "ampere2" bulk build hangup problem on a Windows DevKit 2023
- Reply: Mark Millard : "Re: armv7-on-aarch64 stuck at urdlck: I got a replication of the "ampere2" bulk build hangup problem on a Windows DevKit 2023"
- In reply to: Mark Millard : "Re: armv7-on-aarch64 stuck at urdlck: I got a replication of the "ampere2" bulk build hangup problem on a Windows DevKit 2023"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 21 Jul 2024 10:36:56 UTC
On Jul 20, 2024, at 16:42, Mark Millard <marklmi@yahoo.com> wrote:
> On Jul 20, 2024, at 01:57, Konstantin Belousov <kostikbel@gmail.com> wrote:
>
>> [Everything and everybody in Cc: are stripped for good].
>>
>> On Fri, Jul 19, 2024 at 10:38:36PM -0700, Mark Millard wrote:
>>> 0x201375c0 - 0x2014092c is .bss in /lib/libthr.so.3
>>>
>>> (gdb) bt
>>> #0 0x201aeec0 in __pthread_map_stacks_exec () from /lib/libc.so.7
>>> #1 0x2005d1e4 in ?? () from /libexec/ld-elf.so.1
>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>> (gdb) disass
>>> Dump of assembler code for function __pthread_map_stacks_exec:
>>> => 0x201aeec0 <+0>: ldr r0, [pc, #8] @ 0x201aeed0 <__pthread_map_stacks_exec+16>
>>> 0x201aeec4 <+4>: add r0, pc, r0
>>> 0x201aeec8 <+8>: ldr r0, [r0, #156] @ 0x9c
>>> 0x201aeecc <+12>: bx r0
>>> 0x201aeed0 <+16>: andseq r6, r7, r4, lsr #12
>>> End of assembler dump.
>>>
>>
>> Do the following:
>> 1. Rebuild rtld/libc/libthr with the debugging info and no optimization,
>> i.e. ensure that flags are "-O0 -g" or "-Og -g" and not -O2. See
>> the first comment in libexec/rtld-elf/Makefile for the hint how to
>> do it.
>
> I did a full buildworld with "-Og -g" via temporary
> use of:
>
> diff --git a/share/mk/sys.mk b/share/mk/sys.mk
> index 44db9266784f..9c6c7ce575a4 100644
> --- a/share/mk/sys.mk
> +++ b/share/mk/sys.mk
> @@ -145,7 +145,8 @@ CC ?= c89
> CFLAGS ?= -O
> .else
> CC ?= cc
> -CFLAGS ?= -O2 -pipe
> +#CFLAGS ?= -O2 -pipe
> +CFLAGS ?= -Og -g -pipe
> .if defined(NO_STRICT_ALIASING)
> CFLAGS += -fno-strict-aliasing
> .endif
>
> I installed the result armv7 world into a
> directory tree and installed pkg and cairo.
>
>> 2. Reproduce the issue
>
> The dlopen_test.c based case does not fail under the world
> built with "-Og -g":
>
> # cc -g -std=c11 -pedantic -Wall -pthread dlopen_test.c ; ./a.out
> #
>
>> under gdb
>
> (gdb) run
> Starting program: /root/a.out [Inferior 1 (process 36680) exited normally]
> (gdb)
>
> So it does not reproduce in gdb when buildworld was based
> on "-Og -g".
I found another context that has useful debugger information
and also fails. It avoids graphviz being involved:
) a pkgbase install that I had around (pkgbase has debug information)
) also set up /home/pkgbuild/worktrees/main/ to refer to the /usr/src/ that pkgbase put in place
) pkg install cairo
) use of my simple dlopen program
(gdb) run
Starting program: /root/a.out
Catchpoint 7
Inferior loaded /lib/libgcc_s.so.1
/lib/libthr.so.3
/lib/libc.so.7
/lib/libsys.so.7
r_debug_state (rd=<optimized out>, m=<optimized out>) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4485
4485 }
(gdb) c
Continuing.
Breakpoint 3, get_program_var_addr (name=0x20042f2a "__progname", lockstate=0x0) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4523
4523 symlook_init(&req, name);
(gdb) c
Continuing.
Breakpoint 3, get_program_var_addr (name=0x20043c97 "environ", lockstate=0x0) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4523
4523 symlook_init(&req, name);
(gdb) c
Continuing.
Breakpoint 3, get_program_var_addr (name=0x20043c9f "__elf_aux_vector", lockstate=0x0) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4523
4523 symlook_init(&req, name);
(gdb) c
Continuing.
Breakpoint 3, get_program_var_addr (name=0x200442e8 "__libc_atexit", lockstate=lockstate@entry=0xffffd668) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4523
4523 symlook_init(&req, name);
(gdb) c
Continuing.
Catchpoint 7
Inferior loaded /usr/local/lib/libcairo.so.2
/usr/local/lib/libpixman-1.so.0
/usr/local/lib/libfontconfig.so.1
/usr/local/lib/libfreetype.so.6
/usr/local/lib/libEGL.so.1
/usr/lib/libdl.so.1
/usr/local/lib/libpng16.so.16
/usr/local/lib/libxcb-shm.so.0
/usr/local/lib/libxcb.so.1
/usr/local/lib/libxcb-render.so.0
/usr/local/lib/libXrender.so.1
/usr/local/lib/libX11.so.6
/usr/local/lib/libXext.so.6
/lib/libz.so.6
/usr/local/lib/libGL.so.1
/lib/libm.so.5
/usr/local/lib/libexpat.so.1
/usr/lib/libbz2.so.4
/usr/local/lib/libbrotlidec.so.1
/usr/local/lib/libGLdispatch.so.0
/usr/local/lib/libXau.so.6
/usr/local/lib/libXdmcp.so.6
/usr/local/lib/libGLX.so.0
/usr/local/lib/libbrotlicommon.so.1
r_debug_state (rd=<optimized out>, m=<optimized out>) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4485
4485 }
(gdb) c
Continuing.
Breakpoint 3, get_program_var_addr (name=0x200435bf "__pthread_map_stacks_exec", lockstate=0xffffd290) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4523
4523 symlook_init(&req, name);
(gdb) c
Continuing.
Breakpoint 8.3, _thr_stack_fix_protection (thrd=0x20070000) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:140
140 round_up(thrd->attr.guardsize_attr),
(gdb) bt
#0 _thr_stack_fix_protection (thrd=0x20070000) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:140
#1 __thr_map_stacks_exec () at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:178
#2 0x2005d1e4 in map_stacks_exec (lockstate=0xffffd290) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:5946
#3 dlopen_object (name=name@entry=0x1042d "/usr/local/lib/libcairo.so.2", fd=<optimized out>, fd@entry=-1, refobj=<optimized out>, lo_flags=<optimized out>, mode=1, lockstate=0xffffd290)
at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3872
#4 0x20059e4c in rtld_dlopen (name=0x1042d "/usr/local/lib/libcairo.so.2", fd=-1, mode=1) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3751
#5 0x00020510 in main () at dlopen_test.c:14
(gdb) s
139 mprotect((char *)thrd->attr.stackaddr_attr +
(gdb) s
141 round_up(thrd->attr.stacksize_attr),
(gdb) s
140 round_up(thrd->attr.guardsize_attr),
(gdb) s
round_up (size=4096) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:129
129 if (size % _thr_page_size != 0)
(gdb) s
130 size = ((size / _thr_page_size) + 1) *
(gdb) bt
#0 round_up (size=4096) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:130
#1 _thr_stack_fix_protection (thrd=0x20070000) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:140
#2 __thr_map_stacks_exec () at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:178
#3 0x2005d1e4 in map_stacks_exec (lockstate=0xffffd290) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:5946
#4 dlopen_object (name=name@entry=0x1042d "/usr/local/lib/libcairo.so.2", fd=<optimized out>, fd@entry=-1, refobj=<optimized out>, lo_flags=<optimized out>, mode=1, lockstate=0xffffd290)
at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3872
#5 0x20059e4c in rtld_dlopen (name=0x1042d "/usr/local/lib/libcairo.so.2", fd=-1, mode=1) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3751
#6 0x00020510 in main () at dlopen_test.c:14
(gdb) si
129 if (size % _thr_page_size != 0)
(gdb) 130 size = ((size / _thr_page_size) + 1) *
(gdb) bt
#0 round_up (size=4096) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:130
#1 _thr_stack_fix_protection (thrd=0x20070000) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:140
#2 __thr_map_stacks_exec () at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:178
#3 0x2005d1e4 in map_stacks_exec (lockstate=0xffffd290) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:5946
#4 dlopen_object (name=name@entry=0x1042d "/usr/local/lib/libcairo.so.2", fd=<optimized out>, fd@entry=-1, refobj=<optimized out>, lo_flags=<optimized out>, mode=1, lockstate=0xffffd290)
at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3872
#5 0x20059e4c in rtld_dlopen (name=0x1042d "/usr/local/lib/libcairo.so.2", fd=-1, mode=1) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3751
#6 0x00020510 in main () at dlopen_test.c:14
(gdb) disass /s
Dump of assembler code for function __thr_map_stacks_exec:
. . .
130 size = ((size / _thr_page_size) + 1) *
0x20112eec <+340>: mov r0, r6
129 if (size % _thr_page_size != 0)
0x20112ef0 <+344>: ldr r4, [pc, r4]
130 size = ((size / _thr_page_size) + 1) *
=> 0x20112ef4 <+348>: mov r1, r4
0x20112ef8 <+352>: bl 0x20116b60
NOTE: 0x20116760 - 0x20116f30 is .plt in /lib/libthr.so.3
--Type <RET> for more, q to quit, c to continue without paging--
0x20112efc <+356>: mov r9, r0
0x20112f00 <+360>: mov r0, r5
0x20112f04 <+364>: mov r1, r4
0x20112f08 <+368>: bl 0x20116b60
NOTE: 0x20116760 - 0x20116f30 is .plt in /lib/libthr.so.3
0x20112f0c <+372>: mls r1, r0, r4, r5
. . .
(gdb) si
0x20112ef8 130 size = ((size / _thr_page_size) + 1) *
(gdb) 0x20116b60 in ?? () from /lib/libthr.so.3
(gdb) bt
#0 0x20116b60 in ?? () from /lib/libthr.so.3
#1 0x20112efc in round_up (size=4096) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:130
#2 _thr_stack_fix_protection (thrd=0x20070000) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:140
#3 __thr_map_stacks_exec () at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:178
#4 0x2005d1e4 in map_stacks_exec (lockstate=0xffffd290) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:5946
#5 dlopen_object (name=name@entry=0x1042d "/usr/local/lib/libcairo.so.2", fd=<optimized out>, fd@entry=-1, refobj=<optimized out>, lo_flags=<optimized out>, mode=1, lockstate=0xffffd290)
at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3872
#6 0x20059e4c in rtld_dlopen (name=0x1042d "/usr/local/lib/libcairo.so.2", fd=-1, mode=1) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3751
#7 0x00020510 in main () at dlopen_test.c:14
(gdb) si
0x20116b64 in ?? () from /lib/libthr.so.3
(gdb) si
0x20116b68 in ?? () from /lib/libthr.so.3
(gdb) si
0x20116760 in ?? () from /lib/libthr.so.3
(gdb) si
0x20116764 in ?? () from /lib/libthr.so.3
(gdb) si
0x20116768 in ?? () from /lib/libthr.so.3
(gdb) si
0x2011676c in ?? () from /lib/libthr.so.3
(gdb) si
_rtld_bind_start () at /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:78
78 stmdb sp!,{r0-r5,sl,fp}
(gdb) bt
#0 _rtld_bind_start () at /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:78
#1 0x201373b0 in ?? () from /lib/libthr.so.3
NOTE: 0x201373a8 - 0x201375a0 is .got.plt in /lib/libthr.so.3
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Turns out that _thr_rtld_rlock_acquire is looping when the
process is stuck:
. . .
(gdb) bt
#0 _thr_rtld_rlock_acquire (lock=0x20137c40) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_rtld.c:121
#1 0x20060788 in rlock_acquire (lock=0x2008af10 <rtld_locks>, lockstate=lockstate@entry=0xffffd0ec) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld_lock.c:259
#2 0x20059098 in _rtld_bind (obj=0x2008f404, reloff=496) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:1035
#3 0x2005483c in _rtld_bind_start () at /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:89
#4 0x2005483c in _rtld_bind_start () at /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:89
#5 0x2005483c in _rtld_bind_start () at /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:89
. . .
(gdb) info threads
Id Target Id Frame * 1 LWP 100174 of process 97711 _thr_rtld_rlock_acquire (lock=0x20137c40) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_rtld.c:121
So: Only the one main thread.
It is repeating the _thr_rwlock_rdlock loop (lines 121/122):
(gdb) list 115
110 _thr_rtld_rlock_acquire(void *lock)
111 {
112 struct pthread *curthread;
113 struct rtld_lock *l;
114 int errsave;
115
116 curthread = _get_curthread();
117 SAVE_ERRNO();
118 l = (struct rtld_lock *)lock;
119
(gdb)
120 THR_CRITICAL_ENTER(curthread);
121 while (_thr_rwlock_rdlock(&l->lock, 0, NULL) != 0)
122 ;
123 curthread->rdlock_count++;
124 RESTORE_ERRNO();
125 }
>> , and backtrace all threads from userspace.
>> I only need userspace backtrace, not either kernel-side stacks nor
>> the syscall history.
>>
>> Are you sure that the issue is specific to armv7, might be it takes more
>> efforts to reproduce on host native?
===
Mark Millard
marklmi at yahoo.com