NFS locking: lockf freezes (rpc.lockd problem?)

Oliver Fromme olli at lurza.secnetix.de
Mon Aug 28 12:21:13 UTC 2006


Michael Abbott wrote:
 > What about the non-interruptible sleep?  Is this regarded as par for the 
 > course with NFS, or as a problem?
 > 
 > I know that "hard" NFS mounts are treated as completely unkillable, though 
 > why `kill -9` isn't made to work escapes me, but a locking operation which 
 > (presumably) suffers a protocol error?  Or is rpc.lockd simply waiting to 
 > hear back from the (presumably broken) NFS server?  Even so: `kill -9` 
 > ought to work!

SIGKILL _does_ always work.  However, signal processing can
be delayed for various reasons.  For example, if a process
is stopped (SIGSTOP), further signals will only take effect
when it continues (SIGCONT).

Signal processing does not occur if a process is currently
not scheduled, which is the case if the process is blocked
on I/O (indicated by "D" in the STAT column of ps(1), also
called the "disk-wait" state).  That can happen if the
hardware is broken (disk, controller, cable), so an I/O
request doesn't return.  It can also happen if there are
NFS hiccups, as seems to be the case here.

As soon as the "D" state ends, the process becomes runnable
again (i.e. it's put on the schedulers "run queue"), which
means that it'll get a CPU share, and the SIGKILL signal
that you sent it before will be processed, finally.

Some background information:  Each process has a bit mask
which stores the set of received signals.  kill(2) (and
therefore also kill(1)) only sets a bit in that bit mask.
The next time the process is scheduled onto a CPU, the mask
of received signals is processed and acted upon.  That's
not FreeBSD-specific; it works like that on almost all UNIX
systems.  Why does it work that way?  Well, if signals were
processed for processes not on the CPU, then there would be
a "hole":  A process would be able to circumvent the
scheduler, because signal processing happens on behalf of
the process, which means that it runs with the credentials,
resource limits, nice value etc. of that process.  Well, in
theory, a special case could be made for SIGKILL, but it's
quite difficult if you don't want break existing semantics
(or creating holes).

Best regards
   Oliver

-- 
Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"UNIX was not designed to stop you from doing stupid things,
because that would also stop you from doing clever things."
        -- Doug Gwyn


More information about the freebsd-stable mailing list