Re: closefrom blocking, wchan urdlck

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Mon, 27 Dec 2021 15:19:16 UTC
On Mon, Dec 27, 2021 at 04:13:50PM +0100, Jan Mikkelsen wrote:
> 
> 
> > On 27 Dec 2021, at 16:03, Konstantin Belousov <kostikbel@gmail.com> wrote:
> > 
> > On Mon, Dec 27, 2021 at 03:54:57PM +0100, Jan Mikkelsen wrote:
> >> 
> >>> On 27 Dec 2021, at 14:52, Konstantin Belousov <kostikbel@gmail.com> wrote:
> >>> 
> >>> On Mon, Dec 27, 2021 at 01:39:11PM +0100, Jan Mikkelsen wrote:
> >>>> Hi,
> >>>> 
> >>>> (On 11.2)
> >>>> 
> >>>> I am occasionally seeing closefrom() block in a child process created by a call to pdfork().
> >>>> 
> >>>> When this does happen, it is very early after the process has started, while other threads are being created elsewhere in the process. I cannot reproduce it after the thread creation is complete. According to the sigaction man page, this should be async signal safe.
> >>>> 
> >>>> Stack trace from the call to closefrom():
> >>>> 
> >>>> * frame #0: 0x000000080090276c libthr.so.3`_umtx_op_err at _umtx_op_err.S:37
> >>>>   frame #1: 0x00000008008f6121 libthr.so.3`__thr_rwlock_rdlock(rwlock=<unavailable>, flags=<unavailable>, tsp=<unavailable>) at thr_umtx.c:307:10
> >>>>   frame #2: 0x00000008008ff1ac libthr.so.3`_thr_rtld_rlock_acquire [inlined] _thr_rwlock_rdlock(rwlock=0x0000000800911600, flags=0, tsp=0x0000000000000000) at thr_umtx.h:232:10
> >>>>   frame #3: 0x00000008008ff19b libthr.so.3`_thr_rtld_rlock_acquire(lock=0x0000000800911600) at thr_rtld.c:125
> >>>>   frame #4: 0x000000080075332b ld-elf.so.1`rlock_acquire(lock=0x0000000800765270, lockstate=0x00007fffdfbfb8d0) at rtld_lock.c:208:2
> >>>>   frame #5: 0x000000080074ba20 ld-elf.so.1`_rtld_bind(obj=0x0000000800769000, reloff=6072) at rtld.c:861:5
> >>>>   frame #6: 0x0000000800747c7d ld-elf.so.1`_rtld_bind_start at rtld_start.S:121
> >>>>   frame #7: 0x00000000006562d3 prog`Twio::ProcHandle::spawn(this=<unavailable>, command="/bin/echo", args=0x0000000800d7e000, descriptor_mapping=<unavailable>, descriptor_end=3) at prochandle_pdfork.cpp:308:2
> >>> And where is the closefrom() call in the demonstrated trace?
> >>> 
> >>> What version of the system do you use?
> >>> You need at least cbdec8db18b533f6d7be (on HEAD) or a5659943e37a74c96e
> >>> (stable/13) for pdfork() to behave sanely.  But you still not allowed to
> >>> call non-async signal safe functions in the child before exec.
> >> 
> >> 
> >> This is 12.2-p11. I just noticed that I wrote 11.2 above, that is incorrect.
> >> 
> >> Frame 7 is a call to closefrom(). The child process calls dup2(), closefrom(), signal() and then execv(). No other calls are made, and I believe closefrom() is meant to be async signal safe.
> >> 
> > Frame 7 cannot be a call to closefrom(), it would be resolved to closefrom()
> > symbol would it be.
> 
> >From lldb, attached to the hung process:
> 
> (lldb) 
> frame #6: 0x0000000800748c7d ld-elf.so.1`_rtld_bind_start at rtld_start.S:121
>    118 		leaq	(%rsi,%rsi,2),%rsi	# multiply by 3
>    119 		leaq	(,%rsi,8),%rsi		# now 8, for 24 (sizeof Elf_Rela)
>    120 	
> -> 121 		call	_rtld_bind		# Transfer control to the binder
>    122 		/* Now %rax contains the entry point of the function being called. */
>    123 	
>    124 		movq	%rax,0x60(%rsp)		# Store target over reloff argument
> (lldb) 
> frame #7: 0x0000000000656813 amt5-chefd`Twio::ProcHandle::spawn(this=<unavailable>, command="/bin/date", args=0x0000000800d5f010, descriptor_mapping=<unavailable>, descriptor_end=3) at prochandle_pdfork.cpp:304:2
>    301 			_exit(127);
>    302 		}
>    303 	
> -> 304 		closefrom(descriptor_end);
>    305 	
>    306 		signal(SIGPIPE, SIG_DFL);
>    307 		signal(SIGALRM, SIG_DFL);
> (lldb) 
Ah, yes, it is rtld trying to resolve the symbol.  So yes, the fix is to
have the revisions I listed below.  I am not sure about prerequisites.

> 
> 
> >> The commit you can apply cleanly to 12.2, I’m running a build now. Are there other issues with pdfork in 12.2?
> > 
> > pdfork() with threading processes requires 21f749da82e755aafab1276 and the
> > followup cbdec8db18b533f6d7be.  I do not believe any of this is in 12.3,
> > and definitely not in 12.2.
> 
> Thanks, will check and apply.
> 
> Regards,
> 
> Jan M.
>