kern/94772: FIFOs (named pipes) + select() == broken
Bruce Evans
bde at zeta.org.au
Sun Mar 26 12:00:42 UTC 2006
The following reply was made to PR kern/94772; it has been noted by GNATS.
From: Bruce Evans <bde at zeta.org.au>
To: Oliver Fromme <olli at lurza.secnetix.de>
Cc: bug-followup at freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Sun, 26 Mar 2006 22:50:17 +1100 (EST)
On Fri, 24 Mar 2006, Oliver Fromme wrote:
I'm still catching up with your mail on Thursday-Friday. This
and the one with the main patch. I tested and debugged the
patch and found a few problems and many more complications...
> Bruce Evans wrote:
> > Oliver Fromme wrote:
> > > So you mean in the SBS_CANTSENDMORE case, POLLHUP should be
> > > set without checking if the caller has requested POLLOUT in
> > > the events mask? That sounds reasonable, because POLLOUT
> > > certainly can't be returned in that case. It makes the
> > > code more complex, though.
> >
> > Yes. POLLHUP Is also needed for making poll() return for poll()
> > waiting for input only. I think it would make the code slightly
> > less complex.
>
> You're right. My patch made that part of the code slightly
> less complex, indeed.
It tests both SBS_CANTSENDMORE and SBS_CANTRCVMORE. Testing both
seems to be needed, but after my changes things got more complicated
again. For fifos there are 2 sockets each with these 2 flags, so
there are 2**4 combinations of flags to consider. When we set
POLLHUP we are supposed to not set POLLOUT, but even when we force
this in sopoll() we have to worry about fifo_poll() ORing POLLUP
for the read socket together with POLLOUT for the write socket.
Anyway, userland is not ready for POLLHUP, so I think we shouldn't
add it to sopoll() yet.
> > I'm interested in what non-Linux non-FreeBSD systems do.
>
> DEC UNIX 4.0D doesn't return POLLHUP at all, only POLLIN.
> ...
> Solaris 9 seems to behave exactly the same as Linux in the
> ...
>
> NetBSD 3.0 is very interesting, so I give the detailed
> output from the test program (which I modified to produce
> regression test compliant output, see my other mail):
I've only looked at NetBSD-2.0.1 sources. These seem to still have
some of the bugs in 4.4BSD that I fixed. NetBSD-3.0 seems to be better.
> 1..26
> ok 1 Pipe state 4: expected 0; got 0
> ok 2 Pipe state 5: expected POLLIN; got POLLIN
> ok 3 Pipe state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP
> not ok 4 Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
I think we'll need to go back to this (always return POLLIN with POLLHUP).
I found that lat_rpc in lmbench2 is broken without this. At least in my
old version of libc, libc/rpc uses poll() a lot, and it doesn't understand
POLLHUP. E.g., at EOF read_vc() spins forever waiting for POLLIN unless
POLLIN is set together with POLLHUP.
> ok 5 Pipe state 4: expected 0; got 0
> ok 6 Pipe state 5: expected POLLIN; got POLLIN
> ok 7 Pipe state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP
> not ok 8 Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
Same.
> ok 9 FIFO state 0: expected 0; got 0
> ok 10 FIFO state 1: expected 0; got 0
> ok 11 FIFO state 2: expected POLLIN; got POLLIN
> ok 12 FIFO state 2a: expected 0; got 0
> not ok 13 FIFO state 3: expected POLLHUP; got POLLIN
Similarly. I changed your patches to return both POLLHUP and POLLIN here.
(This required complications to zap POLLIN as well as POLLHUP in state 0.)
I thought that returning POLLHUP would be harmless, but it isn't for
output since returning POLLHUP requires not returning POLLOUT so
pgrams that don't understand POLLHUP might spin at EOF for write by
waiting for POLLOUT.
> ok 14 FIFO state 4: expected 0; got 0
> ok 15 FIFO state 5: expected POLLIN; got POLLIN
> not ok 16 FIFO state 6: expected POLLIN | POLLHUP; got POLLIN
Similarly. For this state, we could fix the bug in gdb (premature exit
on POLLHUP when POLLIN is also set and actually indicates non-null data)
by returning only POLLIN. This would only work for polling for readability.
For writability, POLLHUP needs to be returned synchronously if at all, to
give the application a chance of avoiding a write that would fail.
select()'s interface, and returning POLLOUT on EOF, presumably results in
lots of processes killed by SIGPIPE when they try such a write.
> not ok 17 FIFO state 6a: expected POLLHUP; got POLLIN
Same as for pipes.
[... same for second iteration]
> That means two things:
> 1. When POLLHUP is returned, POLLIN is also always
> returned.
> 2. For FIFOs, POLLHUP is not used at all, but POLLIN
> is used instead. This is the behaviour that Stevens
> describes in APUE, by the way.
>
> I guess portable programs cannot rely on the results from
> poll() too much ... They probably just look if at least
> one of POLLHUP and POLLIN is set, and then call read().
> Otherwise they would break on one platform or another.
Not supporting POLLHUP for pipes and fifos seems best. We have
to set POLLIN on EOF since too many programs only look at POLLIN.
Then setting POLLHUP doesn't gain much. It's strange to support
POLLHUP for pipes but not for fifos. It is easier to support for
pipes but more useful for fifos.
> Here's a web page from someone who did similar tests on
> a wide range of operating systems:
>
> http://www.greenend.org.uk/rjk/2001/06/poll.html
>
> His conclusions are a little bit different. *SIGH*
> It's all the fault of fuzzy SUS/POSIX. :-(
Urk. It shows about 50 variations in 12 OS's without even checking
fifos.
We need more regression tests for sockets if we're going to change
sopoll() significantly. I hacked the tests to check socketpair()
(just change pipe() to socketpair(...)). Pipes were once just
socketpairs but are now handled specially, and this gives more
variations. Fortunately not many. Before your changes, there are
no differences for select(), and for poll() there are these:
before:
< ok 3 Pipe state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP
< not ok 4 Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
after:
> not ok 3 Socketpair state 6: expected POLLIN | POLLHUP; got POLLIN
> not ok 4 Socketpair state 6a: expected POLLHUP; got POLLIN
We just lose all setting of POLLHUP, and this only makes a difference
here. (State 6a is the only problem case for pipes and socketpair()
has this and a problem with state 6 too.)
After your changes there are no differences for pipes and socketpairs.
With my version of your changes there is a difference for state 6a again:
before:
< not ok 4 Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
after:
> ok 4 Socketpair state 6a: expected POLLHUP; got POLLHUP
My changes are supposed to always set POLLIN with POLLHUP (giving "not ok"
in state 6a), and they somehow do that in sopoll() for fifos but not for
socketpairs.
Linux-2.6.10 has the following problem cases:
select();
% not ok 9 FIFO state 0: expected set; got clear
Linux apparently doesn't have a special case for state 0 in fifos
(reader with no data, no writer and no disconnection) -- it has the
same behaviour in this state for select() as for poll() although this
behaviour is clearly nonstandard for select().
poll():
not ok 4 Socketpair state 6a: expected POLLHUP; got POLLIN | POLLHUP
In this state (reader with no data and a disconnection), Linux has
simpler behaviour that is inconsistent withe Linux' pipe().
I don't know socket programming well enough to quickly write similar
tests for general connections.
Bruce
More information about the freebsd-bugs
mailing list