bin/117603: [patch] dump(8) hangs on SMP - 4way and higher.

Ian Dowse iedowse at iedowse.com
Sun Nov 4 04:57:26 PST 2007


Hi,

It's probably better to keep the caught variable, since isn't there
a risk of a different signal completing the sigsuspend(), e.g.
SIGINFO caused by Ctrl-T? This is why the patch I posted has a while
loop around the sigsuspend call - it presumbably was an issue with
the original pause() call too.

Would you be able to test the patch I posted? I haven't yet found
a setup here that triggers the original problem, so it would be
good to know if using sigsuspend alone is enough. It wasn't clear
to me what exactly was involved in the original race - the use of
volatiles and the setjmp()/longjmp(), while fairly horrible, did
appear to cover the more obvious race conditions.

Ian

In message <E1IodwH-000GBl-Qc at cs1.cs.huji.ac.il>, Danny Braniss writes:
>This is a multipart MIME message.
>
>--==_Exmh_1194176785_547800
>Content-Type: text/plain; charset=us-ascii
>
>wups, sent an old diffs.
>
>
>--==_Exmh_1194176785_547800
>Content-Type: text/plain ; name="dump.c.diffs"; charset=us-ascii
>Content-Description: dump.c.diffs
>Content-Disposition: attachment; filename="dump.c.diffs"
>
>--- tape.c.orig	Wed Mar  2 04:30:08 2005
>+++ tape.c	Sun Nov  4 13:42:55 2007
>@@ -109,11 +109,6 @@
> 
> int master;		/* pid of master, for sending error signals */
> int tenths;		/* length of tape used per block written */
>-static volatile sig_atomic_t caught; /* have we caught the signal to proceed?
> */
>-static volatile sig_atomic_t ready; /* reached the lock point without having 
>*/
>-			/* received the SIGUSR2 signal from the prev slave? */
>-static jmp_buf jmpbuf;	/* where to jump to if we are ready when the */
>-			/* SIGUSR2 arrives from the previous slave */
> 
> int
> alloctape(void)
>@@ -685,15 +680,13 @@
> void
> proceed(int signo __unused)
> {
>-
>-	if (ready)
>-		longjmp(jmpbuf, 1);
>-	caught++;
>+     // do nothing ...
> }
> 
> void
> enslave(void)
> {
>+	sigset_t	s_mask;
> 	int cmd[2];
> 	int i, j;
> 
>@@ -704,13 +697,11 @@
> 	signal(SIGUSR1, tperror);    /* Slave sends SIGUSR1 on tape errors */
> 	signal(SIGUSR2, proceed);    /* Slave sends SIGUSR2 to next slave */
> 
>-	for (i = 0; i < SLAVES; i++) {
>-		if (i == slp - &slaves[0]) {
>-			caught = 1;
>-		} else {
>-			caught = 0;
>-		}
>+	sigemptyset(&s_mask);
>+	sigaddset(&s_mask, SIGUSR2);
>+	sigprocmask(SIG_BLOCK, &s_mask, NULL);
> 
>+	for (i = 0; i < SLAVES; i++) {
> 		if (socketpair(AF_UNIX, SOCK_STREAM, 0, cmd) < 0 ||
> 		    (slaves[i].pid = fork()) < 0)
> 			quit("too many slaves, %d (recompile smaller): %s\n",
>@@ -733,6 +724,7 @@
> 		              sizeof slaves[0].pid);
> 
> 	master = 0;
>+	kill(slp->pid, SIGUSR2); // start the ball rolling
> }
> 
> void
>@@ -757,6 +749,7 @@
> static void
> doslave(int cmd, int slave_number)
> {
>+	sigset_t s_mask;
> 	int nread;
> 	int nextslave, size, wrote, eot_count;
> 
>@@ -774,7 +767,7 @@
> 	    != sizeof nextslave) {
> 		quit("master/slave protocol botched - didn't get pid of next sl
>ave.\n");
> 	}
>-
>+	sigemptyset(&s_mask);
> 	/*
> 	 * Get list of blocks to dump, read the blocks into tape buffer
> 	 */
>@@ -793,14 +786,7 @@
> 				       quit("master/slave protocol botched.\n")
>;
> 			}
> 		}
>-		if (setjmp(jmpbuf) == 0) {
>-			ready = 1;
>-			if (!caught)
>-				(void) pause();
>-		}
>-		ready = 0;
>-		caught = 0;
>-
>+		sigsuspend(&s_mask);
> 		/* Try to write the data... */
> 		eot_count = 0;
> 		size = 0;
>
>--==_Exmh_1194176785_547800--
>
>


More information about the freebsd-bugs mailing list