Fatal error 'mutex is on list' at line 139 in file /usr/src/lib/libthr/thread/thr_mutex.c (errno = 35)
Konstantin Belousov
kostikbel at gmail.com
Mon Mar 21 07:07:19 UTC 2016
On Mon, Mar 21, 2016 at 07:21:02AM +0200, Konstantin Belousov wrote:
> On Sun, Mar 20, 2016 at 09:28:13AM +0200, Oleg V. Nauman wrote:
> > Fatal error 'mutex 0x800632000 own 0x1885c 0x1885c is on list 0x0 0x80d46ebc0'
> > at line 155 in file /usr/src/lib/libthr/thread/thr_mutex.c (errno = 2 )
> >
> > What I have got after applying this patch:
> >
> > #0 0x0000000805913d6a in thr_kill () from /lib/libc.so.7
> > #1 0x0000000805913d3b in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
> > #2 0x0000000805913ca9 in abort () at /usr/src/lib/libc/stdlib/abort.c:65
> > #3 0x0000000805639554 in _thread_exit (
> > fname=0x80563ac36 "/usr/src/lib/libthr/thread/thr_mutex.c", lineno=155,
> > msg=0x7fffffffd1c0 "mutex 0x800632000 own 0x1885c 0x1885c is on list 0x0
> > 0x80d46ebc0")
> > at /usr/src/lib/libthr/thread/thr_exit.c:182
> > #4 0x000000080562fe36 in mutex_assert_not_owned (m=0x800632000)
> > at /usr/src/lib/libthr/thread/thr_mutex.c:155
> > #5 0x0000000805630009 in enqueue_mutex (curthread=0x80cc15000, m=0x800632000)
> > at /usr/src/lib/libthr/thread/thr_mutex.c:400
> > #6 0x0000000805631665 in mutex_lock_sleep (curthread=0x80cc15000,
> > m=0x800632000,
> > abstime=0x7fffffffd358) at /usr/src/lib/libthr/thread/thr_mutex.c:535
> > #7 0x0000000805630280 in mutex_lock_common (m=0x800632000,
> > abstime=0x7fffffffd358,
> > cvattach=0) at /usr/src/lib/libthr/thread/thr_mutex.c:553
> > #8 0x000000080562f6be in __pthread_mutex_timedlock (mutex=0x810a00008,
> > abstime=0x7fffffffd358) at /usr/src/lib/libthr/thread/thr_mutex.c:583
> > ...
> > gdb) f 6
> > #6 0x0000000805631665 in mutex_lock_sleep (curthread=0x80cc15000,
> > m=0x800632000,
> > abstime=0x7fffffffd358) at /usr/src/lib/libthr/thread/thr_mutex.c:535
> > 535 enqueue_mutex(curthread, m);
> > (gdb) p *curthread
> > $1 = {tid = 100444, lock = {m_owner = 0, m_flags = 0, m_ceilings = {0, 0},
> > m_spare = {0, 0,
> > 0, 0}}, cycle = 0, locklevel = 0, critical_count = 0, sigblock = 0, tle
> > = {
> > tqe_next = 0x0, tqe_prev = 0x805841000 <_thread_list>}, gcle = {tqe_next =
> > 0x0,
> > tqe_prev = 0x0}, hle = {le_next = 0x0, le_prev = 0x80584b3c0}, wle =
> > {tqe_next = 0x0,
> > tqe_prev = 0x0}, refcount = 0, start_routine = 0x0, arg = 0x0, attr =
> > {sched_policy = 2,
> > sched_inherit = 4, prio = 0, suspend = 0, flags = 258, stackaddr_attr =
> > 0x7ffffdfff000,
> > stacksize_attr = 33554432, guardsize_attr = 4096, cpuset = 0x0, cpusetsize
> > = 0},
> > cancel_enable = 1, cancel_pending = 0, cancel_point = 0, no_cancel = 0,
> > cancel_async = 0,
> > cancelling = 0, sigmask = {__bits = {0, 0, 0, 0}}, unblock_sigcancel = 0,
> > in_sigsuspend = 0, deferred_siginfo = {si_signo = 0, si_errno = 0, si_code =
> > 0, si_pid = 0,
> > si_uid = 0, si_status = 0, si_addr = 0x0, si_value = {sival_int = 0,
> > sival_ptr = 0x0,
> > sigval_int = 0, sigval_ptr = 0x0}, _reason = {_fault = {_trapno = 0},
> > _timer = {
> > _timerid = 0, _overrun = 0}, _mesgq = {_mqd = 0}, _poll = {_band = 0},
> > __spare__ = {
> > __spare1__ = 0, __spare2__ = {0, 0, 0, 0, 0, 0, 0}}}},
> > deferred_sigmask = {__bits = {
> > 0, 0, 0, 0}}, deferred_sigact = {__sigaction_u = {__sa_handler = 0x0,
> > __sa_sigaction = 0x0}, sa_flags = 0, sa_mask = {__bits = {0, 0, 0, 0}}},
> > deferred_run = 0, force_exit = 0, state = PS_RUNNING, error = 0, joiner =
> > 0x0, flags = 0,
> > tlflags = 2, mq = {{tqh_first = 0x0, tqh_last = 0x80cc151a0}, {tqh_first =
> > 0x0,
> > tqh_last = 0x80cc151b0}, {tqh_first = 0x0, tqh_last = 0x80cc151c0},
> > {tqh_first = 0x0,
> > tqh_last = 0x80cc151d0}}, ret = 0x0, specific = 0x800631000,
> > specific_data_count = 5,
> > rdlock_count = 0, rtld_bits = 0, tcb = 0x8006df108, cleanup = 0x0, ex = {
> > exception_class = 0, exception_cleanup = 0x0, private_1 = 0, private_2 =
> > 0},
> > unwind_stackend = 0x7ffffffff000, unwind_disabled = 0, magic = 3499860245,
> > report_events = 0, event_mask = 0, event_buf = {event = TD_EVENT_NONE, th_p
> > = 0, data = 0},
> > wchan = 0x0, mutex_obj = 0x0, will_sleep = 0, nwaiter_defer = 0,
> > defer_waiters = {
> > 0x0 <repeats 50 times>}, wake_addr = 0x80584b0c8, sleepqueue =
> > 0x80cc14040}
> >
> > (gdb) f 8
> > #8 0x000000080562f6be in __pthread_mutex_timedlock (mutex=0x810a00008,
> > abstime=0x7fffffffd358) at /usr/src/lib/libthr/thread/thr_mutex.c:583
> > 583 ret = mutex_lock_common(m, abstime, 0);
> > (gdb) p *mutex
> > $2 = (pthread_mutex_t) 0x8000000000000001
> > (gdb) p m
> > $3 = (struct pthread_mutex *) 0x800632000
> > (gdb) p *m
> > $4 = {m_lock = {m_owner = 100444, m_flags = 1, m_ceilings = {0, 0}, m_spare =
> > {0, 0, 0, 0}},
> > m_flags = 1, m_owner = 100444, m_count = 0, m_spinloops = 0, m_yieldloops =
> > 0, m_qe = {
> > tqe_next = 0x0, tqe_prev = 0x0}, m_pqe = {tqe_next = 0x0, tqe_prev = 0x0}}
> >
>
> Thank you, this is useful, but misterious. Below is my current
> (mis-)understanding of the problem based on your debugging data. I wrote
> this mostly to myself, and continue looking at it, but might be somebody
> else is curious.
>
> Note that there is inconsistency in the debugging output from libthr,
> which reports that m->m_qe.tqe_next is not NULL (and it looks like a
> reasonable pointer), while gdb reports that both pointers from m_qe are
> correctly NULLs. The m_lock.m_owner and m_owner values are fine both
> from libthr output and from gdb output. Additional interesting detail is
> that m_qe.tqe_prev is NULL, which agrees with the order of zeroing in
> mutex_init_link().
>
> So it sounds as if dequeue_mutex() is executed after the m_lock.m_owner
> relinguished ownership of the mutex. Visual code inspection does not
> reveal such pathes, it would be huge bug anyway. I currently do not see
> how it is possible at all. Either some additional code is somewhere, or
> memory ordering is broken, or memory is corrupted.
>
> That said, I will flush my patch queue for the libthr shortly, which
> includes the fork/pshared fix you already tested. Ideally, I need the
> minimal stand-alone binary or source which demostrates the problem. I
> understand that it is hard to get with KDE.
>
> As an aside question, what hardware is used to reproduce the assert ?
Please, use the stock libthr from r297141, and apply the following
debugging kernel patch. I am interested whether the messages appear
in dmesg/console for your load.
diff --git a/sys/kern/kern_umtx.c b/sys/kern/kern_umtx.c
index 9474b0b..7718870 100644
--- a/sys/kern/kern_umtx.c
+++ b/sys/kern/kern_umtx.c
@@ -3629,6 +3629,7 @@ umtx_shm_create_reg(struct thread *td, const struct umtx_key *key,
reg = umtx_shm_find_reg(key);
if (reg != NULL) {
+printf("pid %d creating existing key 1\n", td->td_proc->p_pid);
*res = reg;
return (0);
}
@@ -3650,6 +3651,7 @@ umtx_shm_create_reg(struct thread *td, const struct umtx_key *key,
if (reg1 != NULL) {
mtx_unlock(&umtx_shm_lock);
umtx_shm_free_reg(reg);
+printf("pid %d creating existing key 2\n", td->td_proc->p_pid);
*res = reg1;
return (0);
}
More information about the freebsd-current
mailing list