[Bug 203162] when close(fd) on a fifo fails with EINTR, the file descriptor is not really closed

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Wed Sep 16 22:34:14 UTC 2015


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203162

            Bug ID: 203162
           Summary: when close(fd) on a fifo fails with EINTR, the file
                    descriptor is not really closed
           Product: Base System
           Version: 10.1-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: victor.stinner at gmail.com

Created attachment 161126
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=161126&action=edit
Program to reproduce the bug

tl;dr The close() syscall doesn't close correctly a FIFO file descriptor when
close() is interrupted by a signal.

Hi,

I'm working on the Python project. Python 3.5 now retries syscalls when a
syscall fails with EINTR. This change is described in
https://www.python.org/dev/peps/pep-0475/

The associated unit test "test_eintr" hangs sometimes on our FreeBSD buildbots
(FreeBSD 9 and 10). It took me some days to identify that test_open() of
test_eintr hangs sometimes. It looks like the test hangs when the close()
syscall fails with EINTR.

By the way, the BSD C library ignores EINTR in this case, so the caller of the
close() function is not aware that the syscall failed with EINTR. In Python, it
was also decided to ignore EINTR on close() and dup2() because the file
descriptor is closed anyway. It's explained in the PEP 475 (see the link
below).

The test ensures that Python retries correctly open() when the function fails
with EINTR. The test uses two processes, I will call them the parent process
and the child process. To get a EINTR on open(), the test uses a FIFO created
by mkfifo(). The parent calls mkfifo() and immedialty tries to open the FIFO
for writing: open() blocks until the child opens the FIFO for reading. Both
processes uses setitimer() to inject SIGARLM signals every 10 ms. The child
process sleeps 100 ms, opens the FIFO for reading and then close it.

Attached tarball contains a C program based on the Python unit test.

To reproduce the bug, run ./test.sh multiple times in different terminals, you
have to pass a different number to each run (to name the truss log file): the
program should hang after between 1 and 5 minutes.

You may have to stop/restart the script: truss creates a ghost process for the
child process which becomes <defunct>, so quickly we will reach the number of
processes limit.

I noticed two cases: the test hangs (no more output) or the test slowly fills
the terminal with "@". The "@" character is written each time that open() fails
with EINTR in the child process (only in the child process, this case doesn't
produce output in the parent process).

I'm quite sure that truss has bugs and fails to log correctly syscalls in the
parent and the child process. To workaround truss bugs, I wrote my program to
ensure that open(path, O_WRONLY) returns the fd 3 in the parent process and
open(path, O_RDONLY) returns the fd 4 in the child process. So depending on the
fd number, you can check if it's the parent or the child process. It helps to
workaround truss bugs.

When the close() syscall fails with EINTR: fstat(fd) fails with EBAD, so the
file descriptor seems to be really closed.

Note: I reproduced the bug in a VM running FreeBSD 10.1-RELEASE-p6 with a
single core (1 virtual CPU in fact).

Note: I'm following evolutions of the FreeBSD kernel from the Python test
suite. I noticed that FreeBSD made *huge* progresses on handling threads and
signals. Congrats :-)

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list