namei lookup vnode locking
srajag00 at yahoo.com
Wed Dec 26 17:13:37 PST 2007
I encountered an issue with FreeBSD 6.1 and would
appreciate some feedback on this. The problem happens
when a perl script running on some client system does
a telnet into the FreeBSD box, exits from the login
shell and immediately exits the perl script too. After
the script exits, there is a deadlock on the FreeBSD
box that prevents new processes (such as ps, top etc.)
from starting. Upon investigation, this seems to be
caused due to the following sequence of events on the
1) login process exits. exit call in the kernel closes
all file descriptors. One of these is the fd for
/dev/ttyp0, used for the telnet session. login locks
the vnode for /dev/ttyp0 and waits for 5 minutes in
order for the tty to drain (ttywait() call).
2) The tty is supposed to be drained by telnetd.
However, telnetd sees the network connection go
down when the perl script exits. As a result, it
jumps to cleanup code, where it tries to do chmod
on /dev/ttyp0. chmod syscall attempts to lock
/dev/ttyp0, but fails as the lock is held by login,
which puts telnetd process to sleep. However,
telnetd holds the lock on the vnode for /dev.
It appears that the lock was acquired when doing the
namei lookup for /dev/ttyp0. The current state is
that there is output in the tty that has to be
read by telnetd, but it can't because it is sleeping
for the /dev/ttyp0 lock. telnetd is holding the
/dev vnode lock while sleeping.
3) As a result, any process that needs the /dev
vnode lock is put to sleep for 5 minutes (ttywait
waits for a default of 5 minutes). Even if a
process wants to open an unrelated device file,
/dev/foo, it is not able to do so because the /dev
lock is held by telnetd.
1) Does namei lookup need to acquire an exclusive
lock on intermediate vnodes when looking up a pathname
i.e. if telnetd is trying to lookup /dev/ttyp0, does
it need to get an exclusive lock on /dev? Can it
be a shared lock that will allow at least other
to make progress?
2) Besides relaxing the locking above, any other
thoughts on how to fix this? Reducing the tty timeout
from the close routine is another option, but that
only limits the duration of the deadlock.
Never miss a thing. Make Yahoo your home page.
More information about the freebsd-fs