misc/117603: dump(8) hangs on SMP - 4way and higher.

Danny Braniss danny at cs.huji.ac.il
Sun Oct 28 08:10:01 PDT 2007


>Number:         117603
>Category:       misc
>Synopsis:       dump(8) hangs on SMP - 4way and higher.
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Oct 28 15:10:01 UTC 2007
>Closed-Date:
>Last-Modified:
>Originator:     Danny Braniss
>Release:        FreeBSD 7.0-BETA1 amd64
>Organization:
>Environment:
System: FreeBSD sunfire 7.0-BETA1 FreeBSD 7.0-BETA1 #1: Sat Oct 20 16:30:43 IST 2007 danny at sunfire:/r+d/obj/sunfire/r+d/7.0/src/sys/HUJI amd64


	
>Description:
	dump will create 4 processes, 3 of which read from disk, and
	via some syncronization will seq. write to tape/file.
	the method used to sync. these 'slaves' worked fine on older,
	slower, non-smp hosts. on a dual cpu, dual core, it hangs
	very frequently.
>How-To-Repeat:
	dump 0aLf /some/file /
>Fix:
	patch follows.

--- tape.c.orig 2005-03-02 04:30:08.000000000 +0200
+++ tape.c      2007-10-28 16:17:46.728015000 +0200
@@ -109,11 +109,8 @@
 
 int master;            /* pid of master, for sending error signals */
 int tenths;            /* length of tape used per block written */
+
 static volatile sig_atomic_t caught; /* have we caught the signal to proceed? */
-static volatile sig_atomic_t ready; /* reached the lock point without having */
-                       /* received the SIGUSR2 signal from the prev slave? */
-static jmp_buf jmpbuf; /* where to jump to if we are ready when the */
-                       /* SIGUSR2 arrives from the previous slave */
 
 int
 alloctape(void)
@@ -685,15 +682,13 @@
 void
 proceed(int signo __unused)
 {
-
-       if (ready)
-               longjmp(jmpbuf, 1);
        caught++;
 }
 
 void
 enslave(void)
 {
+       sigset_t        s_mask;
        int cmd[2];
        int i, j;
 
@@ -704,6 +699,10 @@
        signal(SIGUSR1, tperror);    /* Slave sends SIGUSR1 on tape errors */
        signal(SIGUSR2, proceed);    /* Slave sends SIGUSR2 to next slave */
 
+       sigemptyset(&s_mask);
+       sigaddset(&s_mask, SIGUSR2);
+       sigprocmask(SIG_BLOCK, &s_mask, NULL);
+
        for (i = 0; i < SLAVES; i++) {
                if (i == slp - &slaves[0]) {
                        caught = 1;
@@ -793,12 +792,8 @@
                                       quit("master/slave protocol botched.\n");
                        }
                }
-               if (setjmp(jmpbuf) == 0) {
-                       ready = 1;
-                       if (!caught)
-                               (void) pause();
-               }
-               ready = 0;
+               if(!caught)
+                    sigsuspend(0);
                caught = 0;
 
                /* Try to write the data... */
>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list