Unkillable KSE threaded proc

Andrew Gallatin gallatin at cs.duke.edu
Wed Sep 15 07:22:59 PDT 2004


Julian Elischer writes:
 > either of :
 > http://www.freebsd.org/~julian/q.diff
 > 
 > or
 > 
 > http://www.freebsd.org/~julian/r.diff
 > 
 > Might make some difference.
 > 
 > today's q.diff has a fix that was missing yesterday.

Both seem the same as unpatched head -- app starts, runs normally,
then skill -9 -u gallatin leaves threads stuck on the cpu, seeminlgly
deadlocking the system.

But -- I think I now have a clue as to what's going on.  I started a
ktrace of the problematic process just before doing the skill -9, and
afterwards it kept on tracing.

I noticed it was stuck doing this:

   569 mx_pingpong RET   ioctl -1 errno 4 Interrupted system call
   569 mx_pingpong Events dropped.
   569 mx_pingpong RET   ioctl -1 errno 4 Interrupted system call
   569 mx_pingpong Events dropped.
   569 mx_pingpong RET   ioctl -1 errno 4 Interrupted system call

It turns out that the userspace code is basically doing:

  do {
    MUTEX_LOCK(&lock);
    should_exit = work();
    MUTEX_UNLOCK(&lock);
    ioctl(fd, DRIVER_WAIT)
  } while (!should_exit);
  return NULL;

Changing it to

<...>
    rv = ioctl(fd, DRIVER_WAIT)
  } while ((rv == 0 || rv == EWOULDBLOCK) && !should_exit);
  return NULL;

Seems like it works around the problem with your r.diff patch applied
to head.  The ioctl in the driver boils down to a cv_timedwait_sig(),
which is where the EINTR is coming from.

Even if this is our bug, I think that a user-level bug like this should
not be able to deadlock the system... 

FWIW, even with the fix to the user-level code, we still have the
original problem (one lingering thread using no CPU) in RELENG_5.

Drew





More information about the freebsd-threads mailing list