nvi dying with "Resource temporarily unavailable" [SOLVED]

Stephen McKay smckay at internode.on.net
Wed Apr 27 21:14:09 PDT 2005


This is resurrecting an old thread, but I'd like the answer to be found
in searches, so here goes:

On Monday, 25th August 2003, Stephen McKay wrote:

>On Saturday, 9th August 2003, Doug White wrote:
>
>>On Fri, 8 Aug 2003, Stephen McKay wrote:
>>
>>> >Stephen McKay wrote:
>>> >> Since I upgraded to FreeBSD 4.8 (from 4.5) I've noticed occasional failures
>>> >> of nvi.  It will suddenly die as a key is pressed, emitting:
>>> >>
>>> >> Error: input: Resource temporarily unavailable
>>
>>We went round and round on irc about this a few weeks back.  We pinned it
>>down to a bad error check in nvi.  Unfortunately the fix was non-obvious.
>>There's a read() that needs to check for EAGAIN and loop back around on
>>the read.  If someone wants to take a crack at this, the offending read()
>>is at common/cl_read.c line 266.

>You almost had me convinced until I got this extraordinary result:
>
>$ cat
>cat: stdin: Resource temporarily unavailable
>$
>
>Never seen the like in all my born days.
>
>I'm running zsh 3.1.9 on FreeBSD 4.8-RELEASE.  /bin/cat is a simple program.
>If it isn't working properly, there's a fault in zsh or the kernel.  I have
>not upgraded zsh since July 2000.  That makes it a bug in the kernel.  What
>else could it be?  At a stretch perhaps a bug in libc.  Nothing else comes
>to mind.

The fault lies in libc_r.  I don't yet know how to fix libc_r, or even if
it will ever be fixed, but I have installed a protective mechanism on my
system that I am willing to live with.

One of the things that libc_r does is a trick to allow I/O in one thread
to not block other threads: it sets all file descriptors to nonblocking,
including 0, 1 and 2.

Descriptors 0, 1 and 2 normally refer to your tty (usually a pty nowadays)
unless you go to the trouble of redirecting them.  A further twist is that
these descriptors are not the result of reopening your tty, but come from
dup(), and hence share the underlying file flags with all other processes
in that session, including your shell and any processes it starts before
or after the one that uses libc_r.

Setting nonblocking mode on a shared descriptor like this affects *all*
processes using it.  In other words, your shell, nvi, cat, and indeed
all other programs on that tty now have a nonblocking descriptor for it.
Having stdin or stdout suddenly become non-blocking causes many programs
to fail mysteriously.

In short, just running a program linked against libc_r in the background
can cause other programs to fail.  This is clearly unacceptable.

It has taken me quite some years to track this down and it has almost made
me lose faith in FreeBSD.  (Why would anyone use an OS that fails randomly?)
It's especially illuminating (from a programming point of view) that the
root cause is in a subsystem I've never used and hence never examined all
those times I went looking for the problem.  It's the other programmers
that have been using libc_r in more and more programs (some which I use
without even knowing it) that has caused this slow degradation of my FreeBSD
experience.

How many other bugs like this are hidden in the ever increasing complexity
of FreeBSD (or indeed any other software)?  Unexpected interactions are
everywhere and we should work hard to minimise them!

OK, enough of the rambling philosophy: How can this be prevented?

As described in this posting:

http://lists.freebsd.org/mailman/htdig/freebsd-hackers/2005-January/009742.html

I have added code to my 4.11 kernel to prevent background processes from
setting O_NONBLOCK on ttys.  I've been running with this for over 3 months
and in that time have had no unexpected nvi exits or other weirdness.

I believe this is a cure.  I also believe that no process can reasonably
expect to set O_NONBLOCK on its tty when in the background and hence I think
this should be added to -current.

But the side effect of the cure is that you cannot start a threaded
program in the background without redirecting stdin, stdout and stderr
elsewhere.  I accept this as a cost of fixing the problem.  You may not
be so generous.  If so, perhaps you can think of a way of fixing libc_r
directly.

Personally, I'd be happy enough to prevent the damage (by banning background
O_NONBLOCK on ttys) while waiting for libc_r to die a natural death as the
other threading libraries in 5.x and 6.x take over.

Stephen.


More information about the freebsd-stable mailing list