Unkillable KSE threaded proc
Andrew Gallatin
gallatin at cs.duke.edu
Wed Sep 15 07:22:59 PDT 2004
Julian Elischer writes:
> either of :
> http://www.freebsd.org/~julian/q.diff
>
> or
>
> http://www.freebsd.org/~julian/r.diff
>
> Might make some difference.
>
> today's q.diff has a fix that was missing yesterday.
Both seem the same as unpatched head -- app starts, runs normally,
then skill -9 -u gallatin leaves threads stuck on the cpu, seeminlgly
deadlocking the system.
But -- I think I now have a clue as to what's going on. I started a
ktrace of the problematic process just before doing the skill -9, and
afterwards it kept on tracing.
I noticed it was stuck doing this:
569 mx_pingpong RET ioctl -1 errno 4 Interrupted system call
569 mx_pingpong Events dropped.
569 mx_pingpong RET ioctl -1 errno 4 Interrupted system call
569 mx_pingpong Events dropped.
569 mx_pingpong RET ioctl -1 errno 4 Interrupted system call
It turns out that the userspace code is basically doing:
do {
MUTEX_LOCK(&lock);
should_exit = work();
MUTEX_UNLOCK(&lock);
ioctl(fd, DRIVER_WAIT)
} while (!should_exit);
return NULL;
Changing it to
<...>
rv = ioctl(fd, DRIVER_WAIT)
} while ((rv == 0 || rv == EWOULDBLOCK) && !should_exit);
return NULL;
Seems like it works around the problem with your r.diff patch applied
to head. The ioctl in the driver boils down to a cv_timedwait_sig(),
which is where the EINTR is coming from.
Even if this is our bug, I think that a user-level bug like this should
not be able to deadlock the system...
FWIW, even with the fix to the user-level code, we still have the
original problem (one lingering thread using no CPU) in RELENG_5.
Drew
More information about the freebsd-threads
mailing list