[Bug 271490] Deadlock between _rtld_atfork_pre and _thr_attr_init

From: <bugzilla-noreply_at_freebsd.org>
Date: Thu, 18 May 2023 13:22:04 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271490

            Bug ID: 271490
           Summary: Deadlock between _rtld_atfork_pre and _thr_attr_init
           Product: Base System
           Version: 13.2-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: threads
          Assignee: threads@FreeBSD.org
          Reporter: kj@kjtsanaktsidis.id.au

Created attachment 242250
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=242250&action=edit
GDB output from my stuck process

I'm working on fixing some test failures in Ruby on FreeBSD, and I've found
what I think is a deadlock in FreeBSD between two threads, one of which is
forking and one of which is just starting.

The Ruby test which does this looks something like the following:

```
require 'timeout'
def test_daemon_no_threads
  data = Timeout.timeout(3) do
    IO.popen("-") do |f|
        break f.readlines.map(&:chomp) if f
        th = Thread.start { sleep 3 }
        Process.daemon(true, true)
        puts "this is sometimes never reached!"
    end
  end
end
```

The test forks, and then in the child, starts a thread and then forks again
(via a call to daemon(3)). On my machine, this will semi-reliably produce a
deadlock inside the call to `Process.daemon` (and before the second fork
actually takes place). The thread calling `daemon(3)` gets stuck acquiring
locks in `_rtld_atfork_pre`, whilst the new thread (the one which is started as
`Thread.start { sleep 3 }` is stuck inside jemalloc's `extent_deactivate` as
part of a call to `_thr_attr_init`.

---

To reproduce this in the Ruby source, one can checkout the latest Ruby master
from github.com/ruby/ruby.git, do the autoconf/configure dance, run `make`, and
then run

```
while ./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext --
./test/runner.rb test/ruby/test_process.rb -n test_daemon_no_threads; do echo
"ok"; done;
```

On my machine, this will eventually get stuck forever in the test. 

n.b. - if you find the test failing for this reason -
http://rubyci.s3.amazonaws.com/freebsd13/ruby-master/log/20230517T003001Z.fail.html.gz
- that's a _different_ issue, in Ruby itself, that I am trying to fix - that's
how I discovered this deadlock thing...

---

I've attached a pair of backtraces I got from attaching gdb to the process. I
don't really know where to go from here to debug the issue though; I can't seem
to make it happen in a similar C program, and by inspection I can't really see
why there's a deadlock at all. I can't work out why the forking thread would
hold any jemalloc locks, and I _really_ can't see why the new thread would hold
any rtld locks.

Any thoughts? Thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.