kern/175674: sem_open() should use O_EXLOCK with open() instead of a separate flock() call

Mon Feb 4 06:36:52 UTC 2013

On Sun, 3 Feb 2013, Giorgos Keramidas wrote:

> > For a reason unknown to me, open(2) does not restart but always
> > returns [EINTR] when a signal is caught. This is not POSIX-compliant.

Actually, it is restarting that would be POSIX-non-compliant.  From
an old POSIX draft:

@ 27392 ERRORS
@ 27393            The open( ) function shall fail if:
@ ...
@ 27399            [EINTR]             A signal was caught during open( ).

Since it says "shall", this is not optional.

> I see where kern_openat() returns an error when vn_open is interrupted:
>
> 1083         error = vn_open(&nd, &flags, cmode, fp);
> 1084         if (error) {
> ....
> 1109                 if (error == ERESTART)
> 1110                         error = EINTR;
> 1111                 goto bad;
> 1112         }

This code is wrong for a different reason.  Some lower layers return
ERESTART when there was no interrupt, because they actually want to
restart.  The tty top layer and some drivers at least used to do this.
This code breaks the restart, and also confuses applications by return
EINTR when there is no interrupt.

The above code is backwards compared with the usual handling of EINTR
that is done for example in read().  Lower layers should return raw
EINTR and let upper layers convert it to ERESTART, but the above does
the reverse.  I don't know why it does that.  This hasn't changed since
at least FreeBSD-1.

> > The best way to fix this is in kern_openat() in the kernel but this
> > might cause compatibility issues.
>
> Not sure if there would be serious compatibility problems if open() would
> automatically restart instead of returning EINTR.  It definitely seems a rather
> intrusive change though.

It would probably break something even if restarting were an option.  But
probably the breakage wouldn't be very serious.

The EINTR handling is more of a problem for close().  The old POSIX
draft says much the same for close():

@ 6918              If close( ) is interrupted by a signal that is to be caught, it shall return -1 with errno set to [EINTR]
@ 6919              and the state of fildes is unspecified. If an I/O error occurred while reading from or writing to the
@ 6920              file system during close( ), it may return -1 with errno set to [EIO]; if this error is returned, the
@ 6921              state of fildes is unspecified.

But this behaviour is unusable.  EINTR from open() is easy to recover from
in the application, but EINTR with the above specification is impossible
to recover from.  The above specification makes it even more unrecoverable
that EIO, since it only says "may fail" for EIO.  The latter requires the
system to try very hard to recover before actually returning EIO.  But any
reasonable signal handling doesn't have the option of not returning EINTR.
What it can do is try very hard to put filedes in a good state before
returning.  But without this state being specified, the application cannot
recover.  Moreover, most applications don't even check for success of
close().  This gives broken behaviour for EIO too, but EIO is rare and
usually really is unrecoverable.

Returning of EINTR from close() was discussed in the POSIX list last
year.  I didn't like the results.  Most systems are very broken in
this area.  POSIX requires close() on ttys to flush input, but to wait
(possibly forever, but mostly limited by the length of the sysadmin's
holidays) for any buffered output to drain.  This is unusable, and
most systems don't comply with it and have many bugs in their
non-compliance.  However, I just noticed that the above part of the
spec allows almost any mishandling for EINTR -- since the state of
filedes is unspecified, it can be anything, so weaselnix is not
non-compliant when it flushes output before returning EINTR.

Bruce