Fatal error 'mutex is on list' at line 139 in file /usr/src/lib/libthr/thread/thr_mutex.c (errno = 35)
Konstantin Belousov
kostikbel at gmail.com
Mon Mar 21 11:23:04 UTC 2016
On Mon, Mar 21, 2016 at 12:15:15PM +0200, Oleg V. Nauman wrote:
> OK, but please take a look what I have found ( it makes me thinking that
> problem is within the compiled KDE code ):
> The failure point within the KDE code is the same ( at least it is true for
> coredumps generated today ):
>
> #7 0x0000000805a2f6be in __pthread_mutex_timedlock (mutex=0x81b200008,
> abstime=0x7fffffffd458) at /usr/src/lib/libthr/thread/thr_mutex.c:583
> #8 0x000000080443c4b0 in pthreadTimedLock::lock (this=0x81777b680)
> at
> /usr/ports/x11/kdelibs4/work/kdelibs-4.14.3/kdecore/util/kshareddatacache_p.h:252
> ....
> (gdb) f 8
> #8 0x000000080443c4b0 in pthreadTimedLock::lock (this=0x81777b680)
> at
> /usr/ports/x11/kdelibs4/work/kdelibs-4.14.3/kdecore/util/kshareddatacache_p.h:252
> 252 return pthread_mutex_timedlock(&m_mutex, &timeout) == 0;
> (gdb) p &m_mutex
> $1 = (pthread_mutex_t *) 0x81b200008
> (gdb) p m_mutex
> $2 = (pthread_mutex_t &) @0x81b200008: 0x8000000000000001
This is correct. The value is the special cookie set for the process-shared
locks, the actual lock exists elsewere.
> (gdb) p &timeout
> $3 = (timespec *) 0x6
This might be some gdb issue. Anyway, the timeout value is not the problem.
> (gdb) p timeout
> Cannot access memory at address 0x6
> (gdb)
>
> It seems that both m_mutex and timeout are wrong
m_mutex is fine, as I noted above.
>
> The class which generates coredumps looks like:
>
> #if defined(KSDC_THREAD_PROCESS_SHARED_SUPPORTED) &&
> defined(KSDC_TIMEOUTS_SUPPORTED)
> class pthreadTimedLock : public pthreadLock
> {
> public:
> pthreadTimedLock(pthread_mutex_t &mutex)
> : pthreadLock(mutex)
> {
> }
>
> virtual bool lock()
> {
> struct timespec timeout;
>
> // Long timeout, but if we fail to meet this timeout it's probably a
> cache
> // corruption (and if we take 8 seconds then it should be much much
> quicker
> // the next time anyways since we'd be paged back in from disk)
> timeout.tv_sec = 10 + ::time(NULL); // Absolute time, so 10 seconds
> from now
> timeout.tv_nsec = 0;
>
> return pthread_mutex_timedlock(&m_mutex, &timeout) == 0;
> }
> };
> #endif
>
> It is called by:
>
> (gdb) f 9
> #9 0x000000080443c8a8 in KSharedDataCache::Private::CacheLocker::cautiousLock
> (
> this=0x7fffffffd5f0)
> at
> /usr/ports/x11/kdelibs4/work/kdelibs-4.14.3/kdecore/util/kshareddatacache.cpp:1259
> 1259 while (!d->lock() && !isLockedCacheSafe()) {
> gdb) p *d
> $4 = {m_cacheName = {static null = {<No data fields>}, static shared_null =
> {ref = {
> _q_value = 2731}, alloc = 0, size = 0, data = 0x6192ca
> <QString::shared_null+26>,
> clean = 0, simpletext = 0, righttoleft = 0, asciiCache = 0, capacity =
> 0, reserved = 0,
> array = {0}}, static shared_empty = {ref = {_q_value = 50}, alloc = 0,
> size = 0,
> data = 0x805105c3a <QString::shared_empty+26>, clean = 0, simpletext =
> 0,
> righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array =
> {0}},
> d = 0x8176e8180, static codecForCStrings = 0x0}, shm = 0x81b200000,
> m_lock = {<QtSharedPointer::ExternalRefCount<KSDCLock>> =
> {<QtSharedPointer::Basic<KSDCLock>> = {value = 0x81777b680}, d = 0x81777b6c0},
> <No data fields>}, m_mapSize = 10547304,
> m_defaultCacheSize = 10485760, m_expectedItemSize = 0, m_expectedType =
> LOCKTYPE_MUTEX}
> (gdb) p d
> $5 = (KSharedDataCache::Private *) 0x8176d2030
>
> Well I understand that unwinding the KDE code it is a task not for humans..
>
> The hardware is ASUS X552C notebook, Ivybridge, amd64
> I noticed massive coredumps after x11/kdelibs4 recompilation with clang 3.8.0
> so it is possible that it is a problem with code generation.
> It is does not depend on optimization level ( at least it exhibits the same
> behavior for both -O2 and -O0 )
> The only CPU/optimization/code generation specific setting is
> CPUTYPE?=nehalem
> in make.conf
In other words, there is no virtualization involved.
I think that the problem at hands is not related to clang update. You
recently rebuilt kde libs, which probably triggered detection of the new
feature, process-shared locks in our libthr. Before that, older HEAD
does not exposed p/shared as implemented option. Somehow the implementation
and KDE expectations do not match, and asserts in libthr catch that.
Anyway, please apply the debugging patch I posted in the previous mail.
More information about the freebsd-current
mailing list