system() using vfork() or posix_spawn() and libthr

Konstantin Belousov kostikbel at gmail.com
Fri Aug 17 12:43:21 UTC 2012


On Fri, Aug 17, 2012 at 12:39:33AM +0200, Jilles Tjoelker wrote:
> On Thu, Aug 16, 2012 at 02:44:26PM +0300, Konstantin Belousov wrote:
> > My point is that the fact that fork() is called from dynamic context
> > that was identified as the signal handler does not mean anything.
> > It can be mis-identified for many reasons, which both me and you
> > tried to partially enumerate above.
> 
> > The really important thing is the atomicity of the current context,
> > e.g. synchronous SIGSEGV caused by a language runtime GC is very
> > much safe place to call atfork handlers, since runtimes usually cause
> > signal generations only at the safe place.
> 
> > I do not think that such approach as you described can be completed,
> > the _Fork() seems the only robust way.
> 
> Agreed, that way (detecting signal handler) lies madness.
> 
> If need be, _Fork() is easily implemented and used.
> 
> > BTW, returning to Jilles proposal, can we call vfork() only for
> > single-threaded parent ? I think it gives good boost for single-threaded
> > case, and also eliminates the concerns of non-safety.
> 
> This would probably fix the safety issues but at a price. There is a
> correlation between processes so large that they benefit greatly from
> vfork and threaded processes.
Ok, so I will continue with my patch.

> 
> On the other hand, I think direct calls to vfork() in applications are
> risky and it may not be possible to support them safely in all
> circumstances. However, if libc is calling vfork() such as via popen(),
> system() or posix_spawn(), it should be possible even in a
> multi-threaded process. In that case, the rtld and libthr problems can
> be avoided by using aliases with hidden visibility for all functions the
> vforked child needs to call (or any other method that prevents
> interposition and hard-codes a displacement into libc.so).
I do not see how using any aliases could help there. Basically, if mt
process is not single-threaded for vfork, you can have both some parent
thread and child enter rtld. This is complete mess.

> 
> There may still be a problem in programs that install signal handlers
> because the set of async-signal-safe functions is larger than what can
> be done in a vforked child. Userland can avoid this by masking affected
> signals before calling vfork() and resetting them to SIG_DFL before
> unmasking them. This will add many syscalls if the code does not know
> which signals are affected (such as libc). Alternatively, the kernel
> could map caught signals to the default action for processes with
> P_PPWAIT (just like it does not stop such processes because of signals
> or TTY job control).

If rtld does not work, then any library function call from a signal handler
is problematic. My goal is to get a working rtld and possibly malloc.

BTW, not quite related, it seems that the placement of openat,
setcontext and swapcontext in the pthread.map is wrong. openat is
at FBSD_1.1 in libc, and *context are at FBSD_1.0 version, while
libthr exports them at 1.2. App or library gets linked to arbitrary
version depending on whether libphread was specified at link time, and
interposition from libthr does not work.

Below is the latest version of my patch for vfork, which survives (modified)
tools/test/pthread_vfork_test. Patch is only for x86 right now.

diff --git a/lib/libthr/arch/amd64/Makefile.inc b/lib/libthr/arch/amd64/Makefile.inc
index e6d99ec..476d26a 100644
--- a/lib/libthr/arch/amd64/Makefile.inc
+++ b/lib/libthr/arch/amd64/Makefile.inc
@@ -1,3 +1,4 @@
 #$FreeBSD$
 
-SRCS+=	pthread_md.c _umtx_op_err.S
+CFLAGS+=-I${.CURDIR}/../libc/${MACHINE_CPUARCH}
+SRCS+=	pthread_md.c _umtx_op_err.S vfork.S
diff --git a/lib/libthr/arch/amd64/amd64/vfork.S b/lib/libthr/arch/amd64/amd64/vfork.S
new file mode 100644
index 0000000..1f0714f
--- /dev/null
+++ b/lib/libthr/arch/amd64/amd64/vfork.S
@@ -0,0 +1,77 @@
+/*-
+ * Copyright (c) 2012 Konstantin Belousov <kib at freebsd.org>
+ * Copyright (c) 1990 The Regents of the University of California.
+ * All rights reserved.
+ *
+ * This code is derived from software contributed to Berkeley by
+ * William Jolitz.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 4. Neither the name of the University nor the names of its contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <machine/asm.h>
+__FBSDID("$FreeBSD$");
+
+#include "SYS.h"
+
+	.weak	_vfork
+	.set	_vfork,__sys_vfork
+	.weak	vfork
+	.set	vfork,__sys_vfork
+ENTRY(__sys_vfork)
+	call	_thr_vfork_pre
+	popq	%rsi		/* fetch return address (%rsi preserved) */
+	mov	$SYS_vfork,%rax
+	KERNCALL
+	jb	2f
+	cmpl	$0,%eax
+	jne	1f
+	pushq	%rsi
+	pushq	%rsi /* twice for stack alignment */
+	call	CNAME(_thr_vfork_post_child)
+	popq	%rsi
+	popq	%rsi
+	xorl	%eax,%eax
+	jmp	*%rsi
+1:
+	pushq	%rsi
+	pushq	%rax
+	call	CNAME(_thr_vfork_post_child)
+	popq	%rax
+	popq	%rsi
+	jmp	*%rsi
+2:
+	pushq	%rsi
+	pushq	%rax
+	call	CNAME(_thr_vfork_post_child)
+	call	PIC_PLT(CNAME(__error))
+	popq	%rdx
+	movq	%rdx,(%rax)
+	movq	$-1,%rax
+	movq	%rax,%rdx
+	retq
+END(__sys_vfork)
+
+	.section .note.GNU-stack,"",%progbits
diff --git a/lib/libthr/arch/i386/Makefile.inc b/lib/libthr/arch/i386/Makefile.inc
index 01290d5..13d55e2 100644
--- a/lib/libthr/arch/i386/Makefile.inc
+++ b/lib/libthr/arch/i386/Makefile.inc
@@ -1,3 +1,4 @@
 # $FreeBSD$
 
-SRCS+=	pthread_md.c _umtx_op_err.S
+CFLAGS+=-I${.CURDIR}/../libc/${MACHINE_CPUARCH}
+SRCS+=	pthread_md.c _umtx_op_err.S vfork.S
diff --git a/lib/libthr/arch/i386/i386/vfork.S b/lib/libthr/arch/i386/i386/vfork.S
new file mode 100644
index 0000000..ca8896b
--- /dev/null
+++ b/lib/libthr/arch/i386/i386/vfork.S
@@ -0,0 +1,82 @@
+/*-
+ * Copyright (c) 2012 Konstantin Belousov <kib at freebsd.org>
+ * Copyright (c) 1990 The Regents of the University of California.
+ * All rights reserved.
+ *
+ * This code is derived from software contributed to Berkeley by
+ * William Jolitz.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 4. Neither the name of the University nor the names of its contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <machine/asm.h>
+__FBSDID("$FreeBSD$");
+
+#include "SYS.h"
+
+	.weak	_vfork
+	.set	_vfork,__sys_vfork
+	.weak	vfork
+	.set	vfork,__sys_vfork
+ENTRY(__sys_vfork)
+	PIC_PROLOGUE
+	call	PIC_PLT(CNAME(_thr_vfork_pre))
+	PIC_EPILOGUE
+	popl	%ecx		/* my rta into ecx */
+	mov	$SYS_vfork,%eax
+	KERNCALL
+	jb	error_ret
+	pushl	%ecx
+	cmpl	$0,%eax
+	jne	parent_ret
+	PIC_PROLOGUE
+	call	PIC_PLT(CNAME(_thr_vfork_post_child))
+	PIC_EPILOGUE
+	popl	%ecx
+	xorl	%eax,%eax
+	jmp	*%ecx
+parent_ret:
+	pushl	%eax
+	PIC_PROLOGUE
+	call	PIC_PLT(CNAME(_thr_vfork_post_parent))
+	PIC_EPILOGUE
+	popl	%eax
+	popl	%ecx
+	jmp	*%ecx
+error_ret:
+	pushl	%ecx
+	pushl	%eax
+	PIC_PROLOGUE
+	call	PIC_PLT(CNAME(_thr_vfork_post_child))
+	call	PIC_PLT(CNAME(__error))
+	PIC_EPILOGUE
+	popl	%edx
+	movl	%edx,(%eax)
+	movl	$-1,%eax
+	movl	%eax,%edx
+	retl
+END(__sys_vfork)
+
+	.section .note.GNU-stack,"",%progbits
diff --git a/lib/libthr/pthread.map b/lib/libthr/pthread.map
index 355edea..40d14b4 100644
--- a/lib/libthr/pthread.map
+++ b/lib/libthr/pthread.map
@@ -157,6 +157,7 @@ FBSD_1.0 {
 	system;
 	tcdrain;
 	usleep;
+	vfork;
 	wait;
 	wait3;
 	wait4;
diff --git a/lib/libthr/thread/Makefile.inc b/lib/libthr/thread/Makefile.inc
index ddde6e9..92e82ac 100644
--- a/lib/libthr/thread/Makefile.inc
+++ b/lib/libthr/thread/Makefile.inc
@@ -55,4 +55,5 @@ SRCS+= \
 	thr_switch_np.c \
 	thr_symbols.c \
 	thr_umtx.c \
+	thr_vfork.c \
 	thr_yield.c
diff --git a/lib/libthr/thread/thr_private.h b/lib/libthr/thread/thr_private.h
index ba272fe..951c3b8 100644
--- a/lib/libthr/thread/thr_private.h
+++ b/lib/libthr/thread/thr_private.h
@@ -757,6 +757,7 @@ void	_thr_cancel_leave(struct pthread *, int) __hidden;
 void	_thr_testcancel(struct pthread *) __hidden;
 void	_thr_signal_block(struct pthread *) __hidden;
 void	_thr_signal_unblock(struct pthread *) __hidden;
+void	_thr_signal_unblock_noref(struct pthread *) __hidden;
 void	_thr_signal_init(void) __hidden;
 void	_thr_signal_deinit(void) __hidden;
 int	_thr_send_sig(struct pthread *, int sig) __hidden;
@@ -777,6 +778,9 @@ int	_thr_setscheduler(lwpid_t, int, const struct sched_param *) __hidden;
 void	_thr_signal_prefork(void) __hidden;
 void	_thr_signal_postfork(void) __hidden;
 void	_thr_signal_postfork_child(void) __hidden;
+void	_thr_vfork_pre(void) __hidden;
+void	_thr_vfork_post_child(void) __hidden;
+void	_thr_vfork_post_parent(void) __hidden;
 void	_thr_try_gc(struct pthread *, struct pthread *) __hidden;
 int	_rtp_to_schedparam(const struct rtprio *rtp, int *policy,
 		struct sched_param *param) __hidden;
@@ -833,6 +837,7 @@ pid_t	__sys_getpid(void);
 ssize_t __sys_read(int, void *, size_t);
 ssize_t __sys_write(int, const void *, size_t);
 void	__sys_exit(int);
+pid_t	__sys_vfork(void);
 #endif
 
 static inline int
diff --git a/lib/libthr/thread/thr_sig.c b/lib/libthr/thread/thr_sig.c
index 3dee8b7..3e5b25e 100644
--- a/lib/libthr/thread/thr_sig.c
+++ b/lib/libthr/thread/thr_sig.c
@@ -107,6 +107,13 @@ _thr_signal_unblock(struct pthread *curthread)
 		__sys_sigprocmask(SIG_SETMASK, &curthread->sigmask, NULL);
 }
 
+void
+_thr_signal_unblock_noref(struct pthread *curthread)
+{
+	if (curthread->sigblock == 0)
+		__sys_sigprocmask(SIG_SETMASK, &curthread->sigmask, NULL);
+}
+
 int
 _thr_send_sig(struct pthread *thread, int sig)
 {
diff --git a/lib/libthr/thread/thr_vfork.c b/lib/libthr/thread/thr_vfork.c
new file mode 100644
index 0000000..80c9d1e
--- /dev/null
+++ b/lib/libthr/thread/thr_vfork.c
@@ -0,0 +1,123 @@
+/*
+ * Copyright (c) 2012 Konstantin Belousov <kib at freebsd.org>
+ * Copyright (c) 2005 David Xu <davidxu at freebsd.org>
+ * Copyright (c) 2003 Daniel Eischen <deischen at freebsd.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Neither the name of the author nor the names of any co-contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * $FreeBSD$
+ */
+
+/*
+ * Copyright (c) 1995-1998 John Birrell <jb at cimlogic.com.au>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of the author nor the names of any co-contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY JOHN BIRRELL AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include "namespace.h"
+#include <errno.h>
+#include <link.h>
+#include <string.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <pthread.h>
+#include <spinlock.h>
+#include "un-namespace.h"
+
+#include "libc_private.h"
+#include "rtld_lock.h"
+#include "thr_private.h"
+
+static int rtld_locks[MAX_RTLD_LOCKS];
+static int cancelsave, was_threaded;
+
+void
+_thr_vfork_pre(void)
+{
+	struct pthread *curthread;
+
+	curthread = _get_curthread();
+	_thr_rwl_wrlock(&_thr_atfork_lock);
+	cancelsave = curthread->no_cancel;
+	curthread->no_cancel = 1;
+	_thr_signal_block(curthread);
+	_thr_signal_prefork();
+	if (_thr_isthreaded() != 0) {
+		was_threaded = 1;
+		_malloc_prefork();
+		_rtld_atfork_pre(rtld_locks);
+	} else
+		was_threaded = 0;
+}
+
+void
+_thr_vfork_post_child(void)
+{
+	struct pthread *curthread;
+
+	_thr_signal_postfork();
+	if (was_threaded) {
+		_rtld_atfork_post(rtld_locks);
+		_malloc_postfork();
+	}
+	curthread = _get_curthread();
+	_thr_signal_unblock(curthread);
+	curthread->no_cancel = cancelsave;
+	_thr_rwlock_unlock(&_thr_atfork_lock);
+	if (curthread->cancel_async)
+		_thr_testcancel(curthread);
+}
+
+void
+_thr_vfork_post_parent(void)
+{
+	struct pthread *curthread;
+
+	curthread = _get_curthread();
+	_thr_signal_unblock_noref(curthread);
+	if (curthread->cancel_async)
+		_thr_testcancel(curthread);
+}
diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c
index 46cdca1..134ba80 100644
--- a/sys/kern/kern_fork.c
+++ b/sys/kern/kern_fork.c
@@ -150,11 +150,7 @@ sys_vfork(struct thread *td, struct vfork_args *uap)
 	int error, flags;
 	struct proc *p2;
 
-#ifdef XEN
-	flags = RFFDG | RFPROC; /* validate that this is still an issue */
-#else
 	flags = RFFDG | RFPROC | RFPPWAIT | RFMEM;
-#endif		
 	error = fork1(td, flags, 0, &p2, NULL, 0);
 	if (error == 0) {
 		td->td_retval[0] = p2->p_pid;
@@ -756,7 +752,7 @@ fork1(struct thread *td, int flags, int pages, struct proc **procp,
 	struct thread *td2;
 	struct vmspace *vm2;
 	vm_ooffset_t mem_charged;
-	int error;
+	int error, single_threaded;
 	static int curfail;
 	static struct timeval lastfail;
 #ifdef PROCDESC
@@ -815,6 +811,19 @@ fork1(struct thread *td, int flags, int pages, struct proc **procp,
 	}
 #endif
 
+	if (((p1->p_flag & (P_HADTHREADS | P_SYSTEM)) == P_HADTHREADS) &&
+	    (flags & RFPPWAIT) != 0) {
+		PROC_LOCK(p1);
+		if (thread_single(SINGLE_BOUNDARY)) {
+			PROC_UNLOCK(p1);
+			error = ERESTART;
+			goto fail2;
+		}
+		PROC_UNLOCK(p1);
+		single_threaded = 1;
+	} else
+		single_threaded = 0;
+
 	mem_charged = 0;
 	vm2 = NULL;
 	if (pages == 0)
@@ -945,6 +954,12 @@ fail1:
 	if (vm2 != NULL)
 		vmspace_free(vm2);
 	uma_zfree(proc_zone, newproc);
+	if (single_threaded) {
+		PROC_LOCK(p1);
+		thread_single_end();
+		PROC_UNLOCK(p1);
+	}
+fail2:
 #ifdef PROCDESC
 	if (((flags & RFPROCDESC) != 0) && (fp_procdesc != NULL)) {
 		fdclose(td->td_proc->p_fd, fp_procdesc, *procdescp, td);
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20120817/94327a36/attachment.pgp


More information about the freebsd-hackers mailing list