misc/117603: dump(8) hangs on SMP - 4way and higher.
Danny Braniss
danny at cs.huji.ac.il
Sun Oct 28 08:10:01 PDT 2007
>Number: 117603
>Category: misc
>Synopsis: dump(8) hangs on SMP - 4way and higher.
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Sun Oct 28 15:10:01 UTC 2007
>Closed-Date:
>Last-Modified:
>Originator: Danny Braniss
>Release: FreeBSD 7.0-BETA1 amd64
>Organization:
>Environment:
System: FreeBSD sunfire 7.0-BETA1 FreeBSD 7.0-BETA1 #1: Sat Oct 20 16:30:43 IST 2007 danny at sunfire:/r+d/obj/sunfire/r+d/7.0/src/sys/HUJI amd64
>Description:
dump will create 4 processes, 3 of which read from disk, and
via some syncronization will seq. write to tape/file.
the method used to sync. these 'slaves' worked fine on older,
slower, non-smp hosts. on a dual cpu, dual core, it hangs
very frequently.
>How-To-Repeat:
dump 0aLf /some/file /
>Fix:
patch follows.
--- tape.c.orig 2005-03-02 04:30:08.000000000 +0200
+++ tape.c 2007-10-28 16:17:46.728015000 +0200
@@ -109,11 +109,8 @@
int master; /* pid of master, for sending error signals */
int tenths; /* length of tape used per block written */
+
static volatile sig_atomic_t caught; /* have we caught the signal to proceed? */
-static volatile sig_atomic_t ready; /* reached the lock point without having */
- /* received the SIGUSR2 signal from the prev slave? */
-static jmp_buf jmpbuf; /* where to jump to if we are ready when the */
- /* SIGUSR2 arrives from the previous slave */
int
alloctape(void)
@@ -685,15 +682,13 @@
void
proceed(int signo __unused)
{
-
- if (ready)
- longjmp(jmpbuf, 1);
caught++;
}
void
enslave(void)
{
+ sigset_t s_mask;
int cmd[2];
int i, j;
@@ -704,6 +699,10 @@
signal(SIGUSR1, tperror); /* Slave sends SIGUSR1 on tape errors */
signal(SIGUSR2, proceed); /* Slave sends SIGUSR2 to next slave */
+ sigemptyset(&s_mask);
+ sigaddset(&s_mask, SIGUSR2);
+ sigprocmask(SIG_BLOCK, &s_mask, NULL);
+
for (i = 0; i < SLAVES; i++) {
if (i == slp - &slaves[0]) {
caught = 1;
@@ -793,12 +792,8 @@
quit("master/slave protocol botched.\n");
}
}
- if (setjmp(jmpbuf) == 0) {
- ready = 1;
- if (!caught)
- (void) pause();
- }
- ready = 0;
+ if(!caught)
+ sigsuspend(0);
caught = 0;
/* Try to write the data... */
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list